In this section, you will walk through the process of insering PL profile monitors to identify specific PL kernels that causes the potential drop in performance.
This is a three step process:
Add the PL profile monitors in the V++ link command, and generate the SD card image.
Prepare the
xrt.ini
file, and run the design on hardware.Observe the output in the Vitis Analyzer, and analyze the performance.
Open the
Makefile
fromcmd_src/
directory.Locate the
VPP_LINK_FLAGS
, and add--profile.data all:all:all
as follows:VPP_LINK_FLAGS := -l -t $(TARGET) --platform $(BASE_PLATFORM) $(KERNEL_XO) $(GRAPH_O) --profile.data all:all:all --save-temps -g --config $(CONFIG_FILE) -o $(PFM).xsa
The
--profile.data:<arg>
option enables the monitoring of data ports through the monitor IP that are added into the design. In this example,<arg>
is set toall:all:all
, i.e, assign the data profile to all CUs; you can find names fromsystem.cfg
file ass2mm_1
,s2mm_2
andmm2s
and interfaces of all kernels, s2mm and mm2s.Do
make all TARGET=hw
, and a hardware imagesd_card.img
gets generated inside thesw/
directory.Flash the
sd_card.img
file to the SD card. You can follow step 3 in Running the Design on Hardware section.Create a
xrt.ini
file with content as follows:[Debug] device_trace = fine [profile] data=all:all:all
Here:
The
[Debug]
switch key option is used to enable profiling of the application during runtime.The
[profile]
section head contains thedata=all:all:all
to monitor data on all kernels and CUs.
In the console, run the application by:
cd /run/media/mmcblk0p1 ./host.exe a.xclbin
Observe the
TEST PASSED
.Observe the files,
xrt.run_summary
,summary.csv
, anddevice_trace_*.csv
. Copy back the files to the local workspace, and open thexrt.run_summary
file in the Vitis Analyzer using the following command:vitis_analyzer xrt.run_summary
Once the Vitis Analyzer opens, click the
Profile Summary
in the left side pane, and navigate to the Compute Unit Utilization. Observe the compute units and kernels. Also note the time and clock frequency as follows.You can get the data transfer for each compute unit and total Read/write in megabytes by navigating to Kernel Data Transfers -> Top Kernel Transfer as follows:
From the Kernel Data Transfers -> Kernel Transfer tab, you can get the transfer rate, througput utilization (%), and latency details.