Versal Adaptive SoC AI Engine Architecture Manual (AM009)

Document ID
Release Date
1.3 English

There are two trace streams coming out of each AI Engine tile. One stream from the AI Engine and the other from the memory module. Both these streams are connected to the tile stream switch. There is a trace unit in each AI Engine module and memory module in an AI Engine tile, and an AI Engine programmable logic (PL) module in an AI Engine PL interface tile (see types of array interface tiles). The units can operate in the following modes:

  • AI Engine modes
    • Event-time
    • Event-PC
    • Execution-trace
  • AI Engine memory module mode
    • Event-time
  • AI Engine PL module mode
    • Event-time

The trace is output from the unit through the AXI4-Stream as an AI Engine packet-switched stream packet. The packet size is 8x32 bits, including one word of header and seven words of data. The information contained in the packet header is used by the array AXI4-Stream switches to route the packet to any AI Engine destination it can be routed to, including AI Engine local data memory through the AI Engine tile DMA, external DDR memory through the AI Engine array interface DMA, and block RAM or URAM through the AI Engine to PL AXI4-Stream.

The event-time mode tracks up to eight independent numbered events on a per-cycle basis. A trace frame is created to record state changes in the tracked events. The frames are collected in an output buffer into an AI Engine packet-switched stream packet. Multiple frames can be packed into one 32-bit stream word but they cannot cross a 32-bit boundary (filler frames are used for 32-bit alignment).

In the event-PC mode, a trace frame is created each cycle where any one or more of the eight watched events are asserted. The trace frame records the current program counter (PC) value of the AI Engine together with the current value of the eight watched events. The frames are collected in an output buffer into an AI Engine packet-switched stream packet.

The trace unit in the AI Engine can operate in execution-trace mode. In real time, the unit will send, via the AXI4-Stream, a minimum set of information to allow an offline debugger to reconstruct the program execution flow. This assumes the offline debugger has access to the ELF. The information includes:

  • Conditional and unconditional direct branches
  • All indirect branches
  • Zero-overhead-loop LC

The AI Engine generates the packet-based execution trace, which can be sent over the 32-bit wide execution trace interface. The following figure shows the logical view of trace hardware in the AI Engine tile. The two trace streams out of the tile are connected internally to the event logic, configuration registers, broadcast events, and trace buffers.

Note: The different operating modes between the two modules are not shown.
Figure 1. Logical View of AI Engine Trace Hardware

To control the trace stream for an event trace, there is a 32-bit trace_control0/1 register to start and stop the trace. There are also the trace_event0/1 registers to program the internal event number to be added to the trace. See the Versal Adaptive SoC AI Engine Register Reference (AM015) for specific register information.