The Vitis tool supports recording continuous trace data while the application is running. The application can run for a very long time thus leading to the capture of significant trace data, which can result in issues like incomplete trace data especially when the memory resource used for trace data is not large enough. Using continuous trace, analysis of the trace can be carried out while the application is still running or if the application has crashed before completion.
With the ability to continuously capture trace data, the Timeline Trace reports can be dynamically updated in the Vitis analyzer tool while your application is running. Once these reports are loaded in Vitis Analyzer, there is a hyperlink available indicating that the current report is being modified on the disk. If new data needs to be loaded, Reload or Auto-Reload options are available on the banner to let you view the updated report as your application runs and trace data is generated.
Continuous trace is not enabled by default. Additionally, the memory resources of an FPGA are not unlimited. So if the application generates large trace data, a circular buffer for storing the data can be used. The circular buffer can be written, offloaded to the host, and reused again. By enabling a circular buffer with continuous trace, the memory resources needed are even smaller thus saving available resources on the device. However, an application run with continuous trace/circular buffer may result in multiple device trace files.
Here are some scenarios where it is recommended to use the memory resource as a circular buffer.
The circular buffer implementation is automatically turned on when continuous trace is enabled in the xrt.ini. The flow requires the following settings for enabling continuous trace.
- In the xrt.ini file,
continuous_traceis set to TRUE
- v++ linking option
--profile.trace_memoryis set to DDR or HBM
You can optionally set:
- The size of the trace buffer using
trace_buffer_sizein the xrt.ini file. This defaults to 1 MB.
- The interval at which the trace buffer is offloaded from the
trace_buffer_offload_interval_msin the xrt.ini file. The default is 10 ms.
- The interval at which files are dumped by setting
trace_file_dump_interval_s. The default is 3 seconds.
trace_buffer_offload_interval_msto 0 ms.
trace_buffer_sizeas 8k and default
trace_buffer_offload_interval_msof 10 ms, the trace data rate is 819200 bytes/s which is less than the default of 100 MB/s. In this scenario, the circular buffer is NOT enabled by default and an XRT warning is reported:
[XRT] WARNING: Unable to use circular buffer for continuous trace offload. Please increase trace buffer size and/or reduce continuous trace interval. Minimum required offload rate (bytes per second) : 104857600 Requested offload rate : 819200
[Debug] opencl_summary=true opencl_trace=true data_transfer_trace=coarse stall_trace=all continuous_trace=true // The following are optional and needed only in rare circumstances trace_buffer_size=20M trace_buffer_offload_interval_ms=10 trace_file_dump_interval_s=2
The following are the results of these settings:
opencl_summary: Enables the generation of host-related OpenCL API profile summary report, opencl_summary.csv file is created.
opencl_trace: Enables the generation of host-related OpenCL API trace,
opencl_trace.csvfiles is created.
data_transfer_trace: Enables the collection of kernel activity to be added to profile summary and trace,
device_trace_0.csvfiles are created with 0 being the device number.
stall_trace: Enables the hardware generation of stalls into compute units.
continuous_trace: Enables the continuous dumping of files for trace and the continuous reading of device data into the host.
trace_buffer_offload_interval_ms: Controls the reading of device data from the device to the host in milliseconds.
trace_file_dump_interval_s: Controls the time between dumping of trace files in seconds.
As a result, there are several CSV files generated in addition to the
xrt.run_summary as part of the application run using the
above xrt.ini file. Vitis Analyzer only needs the generated
run_summary file and will use the relevant CSV files to display the
profile summary and timeline trace.
Here are the recommendations on setting up an application for trace data dumping:
- By default, an 8k FIFO is used for saving trace data. The FIFO size can be
increased but not preferred above 64k and needs to be preallocated as part of the
v++ linking step. It is also preferred to use device memory for saving trace data.
If you specify a memory bank for trace, you can use
trace_buffer_sizeoption in xrt.ini to control the amount of trace generated at runtime. With device memory, the default size is 1M and the maximum size is 4095M.
- If still unable to dump maximum trace, disable stall trace by setting
- If the application requires larger size of trace buffer, enable circular buffer by
continuous_trace=truewith default settings of
trace_file_dump_interval_s=5. Ideally, a continuous trace feature should be used for the following cases:
- Long-running design with minimal trace generated
- Debugging application crashes where some .csv files might still be available for debugging
- If the application run is still unable to dump the maximum trace, the
trace_buffer_sizecan further be increased.
- If the application still creates huge trace data that the host cannot keep up, use
the smaller size of
trace_file_dump_interval, which creates multiple files equivalent to the interval provided.
- Lastly, continuous trace can generate several trace files as part of the
application run in addition to
xrt.run_summaryfile. The Vitis Analyzer only needs the generated
run_summaryfile and can pick the relevant CSV files generated to display profile summary and timeline trace to provide a better experience.