In this section, you will walk through the process of insering PL profile monitors to identify specific PL kernels that causes the potential drop in performance.
This is a three step process:
Add the PL profile monitors in the V++ link command, and generate the SD card image.
xrt.inifile, and run the design on hardware.
Observe the output in the Vitis Analyzer, and analyze the performance.
VPP_LINK_FLAGS, and add
--profile.data all:all:allas follows:
VPP_LINK_FLAGS := -l -t $(TARGET) --platform $(BASE_PLATFORM) $(KERNEL_XO) $(GRAPH_O) --profile.data all:all:all --save-temps -g --config $(CONFIG_FILE) -o $(PFM).xsa
--profile.data:<arg>option enables the monitoring of data ports through the monitor IP that are added into the design. In this example,
<arg>is set to
all:all:all, i.e, assign the data profile to all CUs; you can find names from
mm2sand interfaces of all kernels, s2mm and mm2s.
make all TARGET=hw, and a hardware image
sd_card.imggets generated inside the
sd_card.imgfile to the SD card. You can follow step 3 in Running the Design on Hardware section.
xrt.inifile with content as follows:
[Debug] device_trace = fine [profile] data=all:all:all
[Debug]switch key option is used to enable profiling of the application during runtime.
[profile]section head contains the
data=all:all:allto monitor data on all kernels and CUs.
In the console, run the application by:
cd /run/media/mmcblk0p1 ./host.exe a.xclbin
Observe the files,
device_trace_*.csv. Copy back the files to the local workspace, and open the
xrt.run_summaryfile in the Vitis Analyzer using the following command:
Once the Vitis Analyzer opens, click the
Profile Summaryin the left side pane, and navigate to the Compute Unit Utilization. Observe the compute units and kernels. Also note the time and clock frequency as follows.
You can get the data transfer for each compute unit and total Read/write in megabytes by navigating to Kernel Data Transfers -> Top Kernel Transfer as follows:
From the Kernel Data Transfers -> Kernel Transfer tab, you can get the transfer rate, througput utilization (%), and latency details.