Using Multiple Event Trace Streams - 2020.2 English

Versal ACAP AI Engine Programming Environment User Guide (UG1076)

Document ID
UG1076
Release Date
2020-11-24
Version
2020.2 English

As AI Engine designs grow larger, tracking the events produced while running the design can be useful to identify performance bottlenecks as well as understanding how the overall AI Engine is operating for the design. Of course, with larger designs more and more events will be produced causing a bottleneck of the events being recorded by the trace IP being used. To capture all this data effectively, and quickly, you should consider instantiating multiple event trace streams. These streams will spread out the event data coming from the AI Engine, letting it store them correctly and in a timely manner.

To increase the trace streams in a design, use the aiecompiler --num-trace-streams option, which can have a value in the range of 1 to 16. The following table provides guidance on the number of trace streams to use, depending on the size of the design.

Table 1. Number of Event Trace Streams Methodology
Number of AI Engines Recommended Number of Streams
Less than 10 1
Between 10 and 20 2
Between 20 and 40 4
Between 40 and 80 8
Larger than 80 16
  1. It is recommended to only use up to 16 trace streams due to the resource utilization impact on the PL and DMA channel resources.

After the change to the AI Engine compiler option, recompile and re-link the XCLBIN file and libadf.a using the Vitis compiler with a config file as described in Linking the System.

v++ -l --config system.cfg ...

The config file includes the following advanced example options:

[advanced]
param=compiler.aieTraceClockSelect=fastest

where compiler.aieTraceClockSelect is the trace clock setting. The value is default or fastest. default is 150 MHz and fastest is 300 MHz.

Recompiling the graph and relinking the XCLBIN file prepares the tool to instantiate additional trace IP into the design to accommodate the added trace events being captured.

Note: Using multiple trace streams consumes more of the programmable logic resources on the device, depending upon how many streams, what kind of events are being captured, and how many tiles are being analyzed.