AIE-ML Memory Module

Versal ACAP AIE-ML Architecture Manual (AM020)

Document ID
AM020
Release Date
2022-09-28
Revision
1.0 English

The AIE-ML memory module (shown in the following figure) contains eight memory banks, two input streams to memory map (S2MM) DMA, two memory-map to output DMA streams (MM2S), and a hardware synchronization module (locks). For each of the four directions (south, west, north, and east), there are separate ports for even and odd ports, and three address generators, two loads, and one store.

Figure 1. AIE-ML Memory Module

Memory Banks
The AIE-ML data memory is 64 KB, organized as eight memory banks, where each memory bank is a 512 word x 128-bit single-port memory. From a programmer's perspective, every two banks are interleaved to form one bank, that is, a total of four banks of 16 KB. Each memory bank has a write enable for each 32-bit word. Banks [0-1] have ECC protection and banks [2-7] have parity check. Bank [0] starts at address 0 of the memory module. ECC protection is a 1-bit error detector/corrector and 2-bit error detector per 32-bit word.
Memory Arbitration
Each memory bank has its own arbitrator to arbitrate between all requesters. The memory bank arbitration is round-robin to avoid starving any requester. It handles a new request every clock cycle. When there are multiple requests in the same cycle to the same memory bank, only one request per cycle will be allowed to access the memory. The other requesters are stalled for one cycle and the hardware will retry the memory request in the next cycle.
Tile DMA Controller
The tile DMA has two incoming and two outgoing streams to the stream switches in the AIE-ML tile. The tile DMA controller is divided into two separate modules, S2MM to store stream data to memory (32-bit data) and MM2S to write the contents of the memory to a stream (32-bit data). Each DMA transfer is defined by a DMA buffer descriptor and the DMA controller has access to the 16 buffer descriptors. These buffer descriptors can also be accessed using a memory-mapped AXI4 interconnect for configuration. Each buffer descriptor contains all information needed for a DMA transfer and can point to the next DMA transfer for the DMA controller to continue with after the current DMA transfer is complete. The DMA controller also has access to the 16 locks that are the synchronization mechanism used between the AIE-ML and DMA or any external memory-mapped AXI4 master (outside of the AIE-ML array) and the DMA. Each buffer descriptor can be associated with locks. This is part of the configuration of any buffer descriptor using memory-mapped AXI4 interconnect.
Compared to AI Engine, the AIE-ML tile DMA has the following changes:
  • Support 4D tensor address generation (including iteration-offset)
  • Supports task queues and task complete tokens
  • Supports S2MM finish on TLAST and out-of-order packets
  • Adds decompression to the two S2MM channels
  • Adds compression to the two MM2S channels
  • Supports task queue and task-complete tokens
Lock Module
The AIE-ML memory module contains a lock module to achieve synchronization amongst the AIE-MLs, tile DMA, and an external memory-mapped AXI4 interface master (for example, the processor system (PS)). The AIE-ML features 16 semaphore locks, which are an evolution of the locks found in AI Engine to enable credit-based synchronization. The semaphore lock has a larger state and no acquired bit; each lock state is 6-bit unsigned. The lock module handles lock requests from the AIE-MLs in all four directions, the local DMA controller, and memory-mapped AXI4.