Xilinx developed multiple iterations of AI Engines. This architecture manual details the specifics of the AI Engine-ML (hereafter referred to as AIE-ML).
Some Versal ACAPs include the AIE-ML that consists of an array of AIE-ML tiles, AIE-ML memory tiles, and the AIE-ML array interface consisting of the network on chip (NoC) and programmable logic (PL) tiles. The following lists the features of each. A pictorial view of the array organization is shown in Figure 1.
AIE-ML Tile Features
- A separate building block, integrated into the silicon, outside the programmable logic (PL).
- One AIE-ML incorporates a high-performance very-long instruction word (VLIW) single-instruction multiple-data (SIMD) vector processor optimized for many applications including machine learning applications.
- From a hardware perspective, data memory is 64 KB organized as eight banks of 8 KB. From a programmer's perspective, every two banks are interleaved to form one bank, that is, a total of four banks of 16 KB each.
- Streaming interconnect for deterministic throughput, high-speed data flow between AIE-ML tiles and/or the programmable logic in the Versal device.
- Direct memory access (DMA) in the AIE-ML tile moves data from incoming stream(s) to local memory and from local memory to outgoing stream(s).
- Configuration interconnect (through memory-mapped AXI4 interface) with a shared, transaction-based switched interconnect for access from external masters to internal AIE-ML tile.
- Hardware synchronization primitives provide synchronization of the AIE-ML, between the AIE-ML and the tile DMA, and between the AIE-ML and an external master (through the memory-mapped AXI4 interface).
- Debug, trace, and profile functionality.
- The AIE-ML tile has additional granularity on clock gating and reset. Clock gating and reset of the AI Engine-ML tile can be done via the memory-mapped AXI4 register inside the tile. In AIE-ML the memory-mapped AXI4, clock gating, and reset registers are moved into an always-on domain to give modular control to core, stream-switch and memory module in the tile. A similar arrangement also applies to the memory tile, a functional unit in the AIE-ML that is introduced in the following section.
AIE-ML Memory Tile Features
- A tile containing 512 KB of high-density, high-bandwidth memory to reduce the use of PL resources in machine learning (ML) applications.
- The memory tile DMA has the same channel features as the AIE-ML tile with the exception that the memory tile DMA also supports 4-D addressing modes. See AIE-ML Memory Tile Architecture for a more detailed description.
- AXI4-Stream interconnect is the same as AIE-ML tile except the number of ports and connectivity are different.
- Memory-mapped AXI4 configuration is the same as the AIE-ML tile.
AIE-ML Array Interface to NoC and PL Resources
- Direct memory access (DMA) in the AIE-ML NoC interface tile manages incoming and outgoing memory-mapped and streams traffic into and out of the AIE-ML array. The interface tile is described in AIE-ML Array Interface Architecture.
- Configuration and control interconnect functionality (through the memory-mapped AXI4 interface)
- Streaming interconnect that leverages the AIE-ML tile streaming interconnect functionality.
- AIE-ML to programmable logic (PL) interface that provides asynchronous clock-domain crossing between the AIE-ML clock and the PL clock.
- AIE-ML to NoC interface logic to the NoC master unit (NMU) and NoC slave unit (NSU) components.
- Hardware synchronization primitives leverage features from the AIE-ML tile locks module.
- Debug, trace, and profile functionality that leverage all the features from the AIE-ML tile.
- For a list of changes from the AI Engine array interface, see AIE-ML Array Interface Architecture.