Enabling Profiling in Your Application - 2020.2 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2021-03-22
Version
2020.2 English

To enable profiling and capturing event trace data during the execution of your application, you must instrument your application for this task. You must enable additional logic, and consume additional device resources to track the host and kernel execution steps, and capture event data. This process requires optionally modifying your host application to capture custom data, modifying your kernel XO during compilation and the XCLBIN during linking to capture different types of profile data from the device side activity, and configuring the Xilinx runtime (XRT) as described in the xrt.ini File to capture data during the application runtime.

Tip: While capturing profile data is a critical part of the profiling and optimization process for building your accelerated application, it does consume additional resources and impacts performance. You should be sure to clean these elements out of your final production build.

There are many different types of profiling for your applications, depending on which elements your system includes, and what type of data you want to capture. The following table shows some of the levels of profiling that can be enabled, and discusses which are complimentary and which are not.

Table 1. Profiling Host and Kernels
Profile/Trace Description Comments
Host ApplicationOpenCL API and some limited device side (kernel) profiling. Specified by the use of the profile and timeline_trace options in the xrt.ini file. Generates the profile_summary.csv and timeline_trace.csv files.
Host Application XRT Native API Specified by the use of the xrt_profile option in the xrt.ini file. Generates trace events for the XRT API.
Host Application User-Event Profiling Requires additional code in the host application as describe din Custom Profiling of the Host Application. Generates user range data and user events for the host application.
Low Overhead Profiling Specified by the use of the lop_trace option in the xrt.ini file. Generates the lop_trace.csv file as described in Enabling Low Overhead Profiling.

Is disabled by profile=true in the xrt.ini file.

Device Side Profiling Enabled by the use of --profile options during v++ compilation and linking, as described in --profile Options. Enables capturing data traffic between the host and kernel, kernel stalls, the execution times of kernels and compute units (CUs), as well as monitoring activity in Versal AI Engines.
AI Engine Graph and Kernels Specified by the use of the aie_profile and aie_traceoptions in the xrt.ini file. These options can be specified together or separately.

Generates the aie_profile_<device>.csv and aie_trace_##_<stream id>.txt reports.

Cannot be used with profile=true in the xrt.ini file.

Is also disabled by the presence of user event profiling in the host application.

Power Profile Specified by the use of the xrt_profile option in the xrt.ini file. Generates the power_profile_<device>.csv report.
Vitis AI Profiling Specified by the use of the vitis_ai_profile option in the xrt.ini file. Enables counter profiling of DPUs to generate the profile_summary.csv file.

Is disabled by profile=true in the xrt.ini file.

The device binary (xclbin) file is configured for capturing limited device-side profiling data by default. However, using the --profile option during the Vitis compiler linking process instruments the device binary by adding Acceleration Monitors and AXI Performance Monitors to the system. This option has multiple instrumentation options: --profile.data, --profile.stall, and --profile.exec, as described in the --profile Options.

As an example, add --profile.data to the v++ linking command line:
v++ -g -l --profile.data all:all:all ...
Tip: Be sure to also use the v++ -g option when compiling your kernel code for debugging with software or hardware emulation.

After your application is enabled for profiling during the v++ compile and link process, data gathering during application runtime must also be enabled in XRT by editing the xrt.ini file as discussed above. For example, the following xrt.ini file will enable OpenCL profiling, power profiling, and event and stall trace capture when the application is run:

[Debug]
profile=true
power_profile=true
timeline_trace=true
data_transfer_trace=coarse
stall_trace=all

To enable the profiling of Kernel Internals data, you must also add the debug_mode tag in the [Emulation] section of the xrt.ini:

[Emulation]
debug_mode=batch

If you are collecting a large amount of trace data, you can increase the amount of available memory for capturing data by specifying the --trace_memory option during v++ linking, and add the trace_buffer_size keyword in the xrt.ini.

--trace_memory
Indicates what type of memory to use for capturing trace data, as described in Vitis Compiler General Options.
trace_buffer_size
Specifies the amount of memory to use for capturing the trace data during the application runtime.

Finally, you can enable continuous trace capture to continuously offload device trace data while the application is running, so in the event of a application or system crash, some trace data is available to help debug the application. To enable, add the continuous_trace keyword in the xrt.ini file.