AI Engine Interfaces

Versal Adaptive SoC AI Engine Architecture Manual (AM009)

Document ID
AM009
Release Date
2023-08-18
Revision
1.3 English
The AI Engine has multiple interfaces. The following block diagram shows the interfaces.
Data Memory Interface
The AI Engine can access data memory modules on all four directions. They are accessed as one contiguous memory. The AI Engine has two 256-bit wide load units and one 256-bit wide store unit. From the AI Engines perspective, the throughput of each of the loads (two) and store (one) is 256 bits per clock cycle.
Program Memory Interface
This 128-bit wide interface is used by the AI Engine to access the program memory. A new instruction can be fetched every clock cycle.
Direct AXI4-Stream Interface
The AI Engine has two 32-bit input AXI4-Stream interfaces and two 32-bit output AXI4-Stream interfaces. Each stream is connected to a FIFO both on the input and output side, allowing the AI Engine to have a 4 word (128-bit) access per 4 cycles, or a 1 word (32-bit) access per cycle on a stream.
Cascade Stream Interface
The 384-bit accumulator data from one AI Engine can be forwarded to another by using these cascade streams to form a chain. There is a small, two-deep, 384-bit wide FIFO on both the input and output streams that allow storing up to four values between AI Engines.
Debug Interface
This interface is able to read or write all AI Engine registers over the memory-mapped AXI4 interface.
Hardware Synchronization (Locks) Interface
This interface allows synchronization between two AI Engines or between an AI Engine and DMA. The AI Engine can access the lock modules in all four directions.
Stall Handling
An AI Engine can be stalled due to multiple reasons and from different sources. Examples include: external memory-mapped AXI4 master (for example, PS), lock modules, empty or full AXI4-Stream interfaces, data memory collisions, and event actions from the event unit.
AI Engine Event Interface
This 16-bit wide EVENT interface can be used to set different events.
Tile Timer
The input interface to read the 64-bit timer value inside the tile.
Execution Trace Interface
A 32-bit wide interface where the AI Engine generated packet-based execution trace can be sent over the AXI4-Stream.
Figure 1. AI Engine Interfaces