To enable profiling and capturing event trace data during the execution of your application, you must instrument your application for this task. You must enable additional logic, and consume additional device resources to track the host and kernel execution steps, and capture event data. This process requires optionally modifying your host application to capture custom data, modifying your kernel XO during compilation and the xclbin during linking to capture different types of profile data from the device side activity, and configuring the Xilinx runtime (XRT) as described in the xrt.ini File to capture data during the application runtime.
There are many different types of profiling for your applications, depending on which elements your system includes, and what type of data you want to capture. The following table shows some of the levels of profiling that can be enabled, and discusses which are complimentary and which are not.
|Host Application OpenCL API and some limited device side (kernel) profiling.||Specified by the use of the
||Generates the opencl_trace.csv file and the xrt.run_summary for viewing in Vitis analyzer.|
|Host Application XRT Native API||Specified by the use of the
||Generates profile summary and trace events for the XRT API as described in Host Programming.|
|Host Application User-Event Profiling||Requires additional code in the host application as described in Custom Profiling of the Host Application.||Generates user range data and user events for the host application.
Tip: Can be used to capture event data for user-managed kernels as described in Setting Up User-Managed Kernels and Argument Buffers.
|Low Overhead Profiling||Specified by the use of the
||Generates the lop_trace.csv file as described in Enabling Low Overhead Profiling.|
|Device Side Profiling||Enabled by the use of
||Enables capturing data traffic between the host and kernel, kernel stalls, the execution times of kernels and compute units (CUs), as well as monitoring activity in Versal AI Engines.|
|AI Engine Graph and Kernels||Specified by the use of the
||Generates the aie_profile_<device>.csv and aie_trace_##_<stream id>.txt reports.|
|Power Profile||Specified by the use of the
Tip: This feature is not supported on certain platforms including AWS.
|Vitis AI Profiling||Specified by the use of the
||Enables counter profiling of DPUs to generate the opencl_summary.csv file and the xrt.run_summary for viewing in Vitis analyzer.|
The device binary (xclbin) file is
configured for capturing limited device-side profiling data by default. However, using
--profile option during the Vitis compiler linking process instruments the device
binary by adding Acceleration Monitors, AXI Performance Monitors, and Memory Monitors to
the system. This option has multiple instrumentation options:
--profile.exec, as described in the --profile Options.
v++linking command line:
v++ -g -l --profile.data all:all:all ...
v++ -goption when compiling your kernel code for debugging with software or hardware emulation.
After your application is enabled for profiling during the
v++ compile and link process, data gathering during
application runtime must also be enabled in XRT by editing the xrt.ini file as discussed above. For example, the
following xrt.ini file enables OpenCL profiling, power profiling, and event and stall
trace capture when the application is run:
[Debug] opencl_trace=true power_profile=true device_trace=fine stall_trace=all
To enable the profiling of Kernel Internals data, you must also add the
debug_mode tag in the
[Emulation] section of the xrt.ini:
If you are collecting a large amount of trace data, you can increase the
amount of available memory for capturing data by specifying the
--profile.trace_memory option during
v++ linking, and add the
trace_buffer_size keyword in the xrt.ini.
- Indicates what type of memory to use for capturing trace data.
- Specifies the amount of memory to use for capturing the trace data during the application runtime.
--profile.trace_memoryis not specified but
device_traceis enabled in the xrt.ini File, the profile data is captured to the default platform memory with 1 MB allocated for the trace buffer size.
Finally, as discussed in Continuous Trace Capture you can enable continuous trace capture to continuously offload device trace data while the application is running, so in the event of an application or system crash, some trace data is available to help debug the application.