xrt.ini File - 2023.1 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2023-07-17
Version
2023.1 English
The Xilinx Runtime (XRT) library uses various control parameters to specify debugging, profiling, and message logging when running the host application and kernel execution. These control parameters are specified in a runtime initialization file, xrt.ini and used to configure features of XRT at start-up.

If you are a command line user, the xrt.ini file needs to be created manually and saved to the same directory as the host executable.

The runtime library checks if xrt.ini exists in the same directory as the host executable and automatically reads the file to configure the runtime. You can also specify the location of an xrt.ini file at runtime by setting the XRT_INI_PATH environment variable to point to the file, for example:

export XRT_INI_PATH=/path/to/xrt.ini
Tip: The AMD Vitis™ IDE creates an xrt.ini file automatically based on your run configuration and saves it in the run configuration folder.

Runtime Initialization File Format

The xrt.ini file is a simple text file with groups of keys and their values. Any line beginning with a semicolon (;) or a hash (#) is a comment. The group names, keys, and key values are all case sensitive.

The following is an example xrt.ini file that enables the timeline trace feature, and directs the runtime log messages to the Console view.

#Start of Debug group 
[Debug] 
native_xrt_trace = true
device_trace = fine

#Start of Runtime group 
[Runtime] 
runtime_log = console

There are three groups of initialization keys:

  • Runtime
  • Debug
    • AIE_profile_settings
    • AIE_trace_settings
  • Emulation

The following tables list all supported keys for each group, the supported values for each key, and a short description of the purpose of the key.

Runtime Group

The Runtime group of switches lets you configure elements of the runtime operation as described below.

Table 1. Runtime Group Keys and Values
Key Valid Values Description
api_checks [true|false] Enables or disables OpenCL API checks.
  • true: Enable. This is the default value.
  • false: Disable.
cpu_affinity {N,N,...} Pins all runtime threads to specified CPUs. Example:
cpu_affinity = {4,5,6}
exclusive_cu_context [true|false] This allows the host application to direct OpenCL to acquire exclusive CU access, so that low-level AXI read/write (xclRegRead and xclRegWrite) can be used for regular kernels.
runtime_log [null | console | syslog | <filename>] Specifies where the runtime logs are printed
  • null: Do not print any logs. This is the default value.
  • console: Print logs to stdout
  • syslog: Print logs to Linux syslog.
  • <filename>: Print logs to the specified file. For example, runtime_log=my_run.log.
verbosity [0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 ] Verbosity of the log messages. The default value is 4.

Debug Group

The Debug group of switches define key options for the enabling profiling of the application during runtime, or tracing data transfers and execution. These switches apply to both AI Engine and PL kernels in the Vitis acceleration flow, and let you configure aspects of the runtime to control the frequency of data capture, the events to capture, and the amount of memory to reserve or use for recording trace and profile data.

Table 2. Debug Options
Key Valid Values Description
aie_profile [true|false] Enables the runtime configuration and polling of AI Engine hardware performance counters. Available on VCK190 hardware and hardware emulation runs.
  • true: Enable.
  • false: Disable. This is the default value.
aie_trace [true|false] Enables the runtime configuration and collection of AI Engine event trace. Available on VCK190 hardware runs only.
  • true: Enable.
  • false: Disable. This is the default value.
aie_status [true|false] Enables the polling of AI Engine status information. Available on VCK190 hardware and hardware emulation runs.
aie_status_interval_us integer (default=1000us) Controls the interval at which AI Engine status information is captured. Specified in microseconds.
app_debug [true|false] Enables the OpenCL application debug for the host code when debugging with GDB.
  • true: Enable.
  • false: Disable. This is the default value.
continuous_trace [true|false] Enables the continuous dumping of files for trace and the continuous reading of device data into the host.
  • true: Enable.
  • false: Disable. This is the default value.
Note: This switch only has an effect if device_trace is enabled.
device_counters [true|false] Enables device counter offload only, without enabling trace functionality.
device_trace [off|fine|coarse|accel] Enables the collection of data from monitors inserted on the PL to add to summary and trace.
  • accel: Traces compute unit starts/stops.
  • coarse: Lumps all reads/writes together under each execution of a compute unit.
  • fine: Tracks everything as it happens.
  • off: Turns off reading and reporting of device-level trace during runtime. This is the default value.
host_trace [true|false] Enables trace of host code based on the first protocol encountered.
Tip: If your host application uses both OpenCL and XRT native API you should manually specify both opencl_trace and native_xrt_trace to capture all events.
lop_trace [true|false] Enables generation of lower overhead OpenCL API host trace. Should not be used with other OpenCL options.
  • true: Enable.
  • false: Disable. This is the default value.
native_xrt_trace [true|false] Enables generation of the Native C/C++ API trace. This also generates the tables for "Host Data Transfer from/to Global memory" in the Profile Summary.
  • true: Enable.
  • false: Disable. This is the default value.
opencl_trace [true|false] Enables generation of OpenCL API host trace.
  • true: Enable.
  • false: Disable. This is the default value.
pl_deadlock_detection [true|false] Enables deadlock detection for PL kernels.
power_profile [true|false] Enables the polling of power data during the execution of the application.
  • true: Enable.
  • false: Disable. This is the default value.
Note: This feature is not supported on embedded platforms or AWS.
power_profile_interval_ms <int>(default=20) Controls the interval of reading the power counters in milliseconds. The default interval is 20 ms.
Note: This switch only has an effect if power_profile = true.
profile_api [true|false] Enables access to HAL API directly from the host application to read counters on device profiling monitors during execution.
  • true: Enable.
  • false: Disable. This is the default value.
stall_trace [off|all|dataflow|memory|pipe] Specifies the type of device-side stalls to capture and report in the timeline trace. The default is off.
  • off: Turn off stall trace information.

    all: Record all stall trace information.

    dataflow: Intra-kernel streams (for example, writing to full FIFO between dataflow blocks).

    memory: External memory stalls (for example, AXI4 read from the DDR memory).

    pipe: Inter-kernel pipe for OpenCL kernels (for example, writing to full pipe between kernels).

Note: This switch only has an effect if device_trace is enabled.
trace_buffer_offload_interval_ms <int> Controls the reading of device data from the device to the host in milliseconds (ms). The default is 10 ms.
Note: This switch only has an effect if device_trace is enabled.
trace_buffer_size <string> If the .xclbin was created with memory offload of trace specified, as described in --profile Options,this switch determines the size of the buffer to allocate in memory to capture trace data. The default is 1M.
Note: This switch only has an effect if device_trace is enabled.
trace_file_dump_interval_s <int> Controls the time between dumping of trace files in seconds (s). The default is 5s.
Note: This switch only has an effect if device_trace is enabled.
vitis_ai_profile [true|false] Profile summary and other files come from Vitis AI application layer.
  • true: Enable.
  • false: Disable. This is the default value.
xocl_debug [true|false]

Generates the xocl.log file when enabled.

When any trace options are also enabled, the debug log is added to the xrt.run_summary to view in Vitis Analyzer.

xrt_trace [true|false] Enables generation of low-level HW shim function trace during HW runs. This will be disabled when used with native_xrt_trace.
  • true: Enable.
  • false: Disable. This is the default value.

AIE_profile_settings Group

The options specified in this group are applied only if aie_profile=true under the [Debug] group.

Table 3. AI Engine Profile Options
Key Valid Values Description
graph_based_aie_metrics <graph name|all>:<kernel name|all>:<off|heat_map|stalls|execution|floating_point|write_bandwidths|read_bandwidths|aie_trace>

Specify the metric sets reported by the AI Engine module of AI Engine tiles on a graph-by-graph basis.

Important: Currently, only all is supported for kernel specification.

Controls the configuration of the statistics read from the AIE core performance counters for the entire AI Engine graph application.

heat_map: profile active/stall cycles and vector instruction usage

stalls: profile the different types of stalls (i.e., memory, stream, lock, and cascade)

execution: profile the AI Engine instructions

floating_point: profile floating point exceptions

write_bandwidths: profile the write bandwidth of streams and cascades

read_bandwidths: profile the read bandwidths of streams and cascades

aie_trace: profile amount and stalls of event trace from core and memory modules

graph_based_aie_memory_metrics <graph name|all>:<kernel name|all>:<off|conflicts|dma_locks|dma_stalls_s2mm|dma_stalls_mm2s|write_bandwidths|read_bandwidths>

Specify the metric sets reported by the memory module of AI Engine tiles on a graph-by-graph basis.

Important: Currently, only all is supported for kernel specification.

Controls the configuration of statistics read from the AI Engine memory performance counters for the entire AI Engine graph application.

conflicts: profile the DMA memory conflicts

dma_locks: profile DMA locks and stalls on lock acquire

dma_stalls_s2mm: profile stalls on DMA S2MM channels

dma_stalls_mm2s: profile stalls on DMA MM2S channels

write_bandwidths: profile bandwidths of DMA S2MM channels

read_bandwidths: profile bandwidths of DMA MM2S channels

tile_based_aie_metrics

<{<column>,<row>}|all>:<off|heat_map|stalls|execution|floating_point|write_bandwidths|read_bandwidths|aie_trace>

;

{<mincolumn,<minrow>}:{<maxcolumn>,<maxrow>}:<off|heat_map|stalls|execution|floating_point|write_bandwidths|read_bandwidths|aie_trace>

Specify the metric sets reported by the AI Engine module of AI Engine tiles on a tile-by-tile basis. This can be used in conjunction with graph-by-graph selection and will take priority on the specified tiles.

Refer to descriptions from graph_based_aie_metrics

tile_based_aie_memory_metrics

<{<column>,<row>}|all>:<off|conflicts|dma_locks|dma_stalls_s2mm|dma_stalls_mm2s|write_bandwidths|read_bandwidths>

;

{<mincolumn,<minrow>}:{<maxcolumn>,<maxrow>}:<off|conflicts|dma_locks|dma_stalls_s2mm|dma_stalls_mm2s|write_bandwidths|read_bandwidths>

Specify the metric sets reported by the memory module of AI Engine tiles on a tile-by-tile basis. This can be used in conjunction with graph-by-graph selection and will take priority on the specified tiles.

Refer to descriptions from graph_based_aie_memory_metrics

tile_based_interface_tile_metrics

<column|all>:<off|input_bandwidths|output_bandwidths|packets>[:<channel>]

;

<mincolumn>:<maxcolumn>:<off|input_bandwidths|output_bandwidths|packets>[:<channel>]

Specify the metric sets reported by the AI Engine interface tiles on a tile-by-tile basis.

Note: Interface tiles are separate from the AI Engine tiles and have different metric sets.
interval_us <int> Controls the interval of reading the AI Engine counter values in microseconds (µs). The default interval is 1000 µs.
Note: This switch only has an effect if aie_profile = true.

AIE_trace_settings Group

The options specified in this group are applied only if aie_trace=true under the [Debug] group.

Table 4. AI Engine Trace Options
Key Valid Values Description
buffer_size <string> (default=8M) Controls the total size of the buffers allocated for AI Engine event trace. This size is partitioned evenly into the number of different trace streams coming out of the AI Engine. The default is 8M.
Note: This switch only has an effect if aie_trace = true.
buffer_offload_interval_us integer (default=10ms) Interval, in milliseconds, between reading of PLIO mode AI Engine trace from device to Host memory.
periodic_offload true/false (default=true) Enables continuous offload of PLIO mode AI Engine trace. Generated AI Engine trace output files (one per stream) gets appended with new trace data.
file_dump_interval_s integer (default=5s) Interval, in seconds, between writing (appending) of raw AI Engine trace data to output files.
graph_based_aie_tile_metrics

string("")

<graph name|all>:<kernel name|all>:<off|functions|functions_partial_stalls|functions_all_stalls>

Specify the metric sets reported by the AI Engine module of AI Engine tiles on a graph-by-graph basis.

Important: Currently, only all is supported for kernel specification.
tile_based_aie_tile_metrics

string("")

<{<column>,<row>}|all>:<off|functions|functions_partial_stalls|functions_all_stalls>[:<memory_stalls|stream_stalls|cascasde_stalls|lock_stalls>]

{<mincolumn,<minrow>}:{<maxcolumn>,<maxrow>}:<off|functions|functions_partial_stalls|functions_all_stalls>

Specify the metric sets reported by the AI Engine module of AI Engine tiles on a tile-by-tile basis.

Important: Currently, only all is supported for kernel specification.
reuse_buffer true/false (false)  

Emulation Group

The Emulation group of switches apply to the emulation environments and the AMD Vivado™ simulator.

Table 5. Emulation Group Keys and Values
Key Valid Values Description
aliveness_message_interval Any integer Specifies the interval in seconds that aliveness messages need to be printed. The default is 300.
debug_mode [off|batch|gui] Specifies how the waveform is saved and displayed during emulation.
  • off: Do not launch simulator waveform GUI, and do not save wdb file. This is the default value.
  • batch: Do not launch simulator waveform GUI, but save wdb file
  • gui: Launch simulator waveform GUI, and save wdb file
Note: The kernel needs to be compiled with debug enabled (v++ -g) for the waveform to be saved and displayed in the simulator GUI.
kernel-dbg [true|false] Enables kernel debug functionality during software emulation as described in Command Line Debug Flow.
  • true: Enable.
  • false: Disable. This is the default value.
print_infos_in_console [true|false] Controls the printing of emulation info messages to user's console. Emulation info messages are always logged into a file called emulation_debug.log
  • true: Print in user's console. This is the default value.
  • false: Do not print in user console.
print_warnings_in_console [true|false] Controls the printing emulation warning messages to user's console. Emulation warning messages are always logged into a file called emulation_debug.log.
  • true: Print in user's console. This is the default value.
  • false: Do not print in user console.
print_errors_in_console [true|false] Controls printing emulation error messages in user's console. Emulation error messages are always logged into the emulation_debug.log file.
  • true: Print in user's console. This is the default value.
  • false: Do not print in user's console.
user_pre_sim_script Path to Tcl file For the first run, run simulation in GUI mode. Add signals that you want to add. Copy the commands from the Tcl console and save into a Tcl script.

For the next run, pass the Tcl script in batch mode.

user_post_sim_script Path to Tcl file Any post operations can be specified in the Tcl and pass to the switch. All the command provided in the Tcl gets executed after simulation is completed.
xtlm_aximm_log [true|false] Enables the XTLM AXI4 Memory Map transaction logging at runtime and you could see all the transactions in the xsc_report.log file.
xtlm_axis_log [true|false] Enables the XTLM AXI4-Stream transaction logging at runtime and you could see all the transactions in the xsc_report.log file.
timeout_scale na/ms/sec/min Timeout support for clPollStream API in emulation. Provides a scale for the timeout specified in clPollStream API. The timeout specified in the code is specified in ms, and might not work for emulation. Therefore use the timeout_scale to map ms to another scale if needed for emulation.
Important: Timeout is not enabled in emulation by default. Use this option to enable clPollStream timeout.