AI Engine Memory Module

Versal Adaptive SoC AI Engine Architecture Manual (AM009)

Document ID
AM009
Release Date
2023-08-18
Revision
1.3 English

The AI Engine memory module (shown in the following figure) contains eight memory banks, two input streams to memory map (S2MM) DMA, two memory-map to output DMA streams (MM2S), and a hardware synchronization module (locks). For each of the four directions (south, west, north, and east), there are separate ports for even and odd ports, and three address generators, two loads, and one store.

Figure 1. AI Engine Memory Module

Memory Banks
The AI Engine memory modules consist of eight memory banks, where each memory bank is a 256 word x 128-bit single-port memory. Each memory bank has a write enable for each 32-bit word. Banks [0-1] have ECC protection and banks [2-7] have parity check. Bank [0] starts at address 0 of the memory module. ECC protection is a 1-bit error detector/corrector and 2-bit error detector per 32-bit word.
Memory Arbitration
Each memory bank has its own arbitrator to arbitrate between all requesters. The memory bank arbitration is round-robin to avoid starving any requester. It handles a new request every clock cycle. When there are multiple requests in the same cycle to the same memory bank, only one request per cycle will be allowed to access the memory. The other requesters are stalled for one cycle and the hardware will retry the memory request in the next cycle.
Tile DMA Controller
The tile DMA has two incoming and two outgoing streams to the stream switches in the AI Engine tile. The tile DMA controller is divided into two separate modules, S2MM to store stream data to memory (32-bit data) and MM2S to write the contents of the memory to a stream (32-bit data). Each DMA transfer is defined by a DMA buffer descriptor and the DMA controller has access to the 16 buffer descriptors. These buffer descriptors can also be accessed using a memory-mapped AXI4 interconnect for configuration. Each buffer descriptor contains all information needed for a DMA transfer and can point to the next DMA transfer for the DMA controller to continue with after the current DMA transfer is complete. The DMA controller also has access to the 16 locks that are the synchronization mechanism used between the AI Engine and DMA or any external memory-mapped AXI4 master (outside of the AI Engine array) and the DMA. Each buffer descriptor can be associated with locks. This is part of the configuration of any buffer descriptor using memory-mapped AXI4 interconnect.
Lock Module
The AI Engine memory module contains a lock module to achieve synchronization amongst the AI Engines, tile DMA, and an external memory-mapped AXI4 interface master (for example, the processor system (PS)). There are 16 hardware locks with a binary data value for each AI Engine memory module lock unit. Each lock has an arbitrator for simultaneous request management. The lock module handles lock requests from the AI Engines in all four directions, the local DMA controller, and memory-mapped AXI4.