Continuous Trace Capture - 2021.2 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
ft:locale
English (United States)
Release Date
2021-12-15
Version
2021.2 English

The Vitis tool supports recording continuous trace data while the application is running. The application can run for a very long time thus leading to the capture of significant trace data, which can result in issues like incomplete trace data especially when the memory resource used for trace data is not large enough. Using continuous trace, analysis of the trace can be carried out while the application is still running or if the application has crashed before completion.

With the ability to continuously capture trace data, the Timeline Trace reports can be dynamically updated in the Vitis analyzer tool while your application is running. Once these reports are loaded in Vitis Analyzer, there is a hyperlink available indicating that the current report is being modified on the disk. If new data needs to be loaded, Reload or Auto-Reload options are available on the banner to let you view the updated report as your application runs and trace data is generated.

Continuous trace is not enabled by default. Additionally, the memory resources of an FPGA are not unlimited. So if the application generates large trace data, a circular buffer for storing the data can be used. The circular buffer can be written, offloaded to the host, and reused again. By enabling a circular buffer with continuous trace, the memory resources needed are even smaller thus saving available resources on the device. However, an application run with continuous trace/circular buffer may result in multiple device trace files.

Tip: For Hardware emulation, only host side continuous trace is available, for hardware runs both host side and device side continuous trace are available.

Here are some scenarios where it is recommended to use the memory resource as a circular buffer.

The circular buffer implementation is automatically turned on when continuous trace is enabled in the xrt.ini. The flow requires the following settings for enabling continuous trace.

  • In the xrt.ini file, continuous_trace is set to TRUE
  • v++ linking option --profile.trace_memory is set to DDR or HBM

You can optionally set:

  • The size of the trace buffer using trace_buffer_size in the xrt.ini file. This defaults to 1 MB.
  • The interval at which the trace buffer is offloaded from the device using trace_buffer_offload_interval_ms in the xrt.ini file. The default is 10 ms.
  • The interval at which files are dumped by setting trace_file_dump_interval_s. The default is 3 seconds.
Important: Circular Buffer can be force enabled by setting trace_buffer_offload_interval_ms to 0 ms.
As an example, if you enable continuous_trace with trace_buffer_size as 8k and default trace_buffer_offload_interval_ms of 10 ms, the trace data rate is 819200 bytes/s which is less than the default of 100 MB/s. In this scenario, the circular buffer is NOT enabled by default and an XRT warning is reported:
[XRT] WARNING: Unable to use circular buffer for continuous trace offload. Please increase trace buffer size and/or reduce continuous
trace interval. Minimum required offload rate (bytes per second) : 104857600 Requested offload rate : 819200
Here is an example of xrt.ini settings:
[Debug]
opencl_summary=true
opencl_trace=true
data_transfer_trace=coarse
stall_trace=all
continuous_trace=true
// The following are optional and needed only in rare circumstances

trace_buffer_size=20M
trace_buffer_offload_interval_ms=10
trace_file_dump_interval_s=2

The following are the results of these settings:

  • opencl_summary: Enables the generation of host-related OpenCL API profile summary report, opencl_summary.csv file is created.
  • opencl_trace: Enables the generation of host-related OpenCL API trace, opencl_trace.csv files is created.
  • data_transfer_trace: Enables the collection of kernel activity to be added to profile summary and trace, device_trace_0.csv files are created with 0 being the device number.
  • stall_trace: Enables the hardware generation of stalls into compute units.
  • continuous_trace: Enables the continuous dumping of files for trace and the continuous reading of device data into the host.
  • trace_buffer_offload_interval_ms: Controls the reading of device data from the device to the host in milliseconds.
  • trace_file_dump_interval_s: Controls the time between dumping of trace files in seconds.

As a result, there are several CSV files generated in addition to the xrt.run_summary as part of the application run using the above xrt.ini file. Vitis Analyzer only needs the generated run_summary file and will use the relevant CSV files to display the profile summary and timeline trace.

Here are the recommendations on setting up an application for trace data dumping:

  1. By default, an 8k FIFO is used for saving trace data. The FIFO size can be increased but not preferred above 64k and needs to be preallocated as part of the v++ linking step. It is also preferred to use device memory for saving trace data. If you specify a memory bank for trace, you can use trace_buffer_size option in xrt.ini to control the amount of trace generated at runtime. With device memory, the default size is 1M and the maximum size is 4095M.
  2. If still unable to dump maximum trace, disable stall trace by setting stall_trace=off or stall_trace=on with data_transfer_trace=coarse.
  3. If the application requires larger size of trace buffer, enable circular buffer by setting continuous_trace=true with default settings of trace_buffer_offload_interval_ms=10 and trace_file_dump_interval_s=5. Ideally, a continuous trace feature should be used for the following cases:
    • Long-running design with minimal trace generated
    • Debugging application crashes where some .csv files might still be available for debugging
  4. If the application run is still unable to dump the maximum trace, the trace_buffer_size can further be increased.
  5. If the application still creates huge trace data that the host cannot keep up, use the smaller size of trace_file_dump_interval, which creates multiple files equivalent to the interval provided.
  6. Lastly, continuous trace can generate several trace files as part of the application run in addition to xrt.run_summary file. The Vitis Analyzer only needs the generated run_summary file and can pick the relevant CSV files generated to display profile summary and timeline trace to provide a better experience.