Array Interface DMA Memory-Mapped AXI4 Master Interface

Versal Adaptive SoC AIE-ML Architecture Manual (AM020)

Document ID
AM020
Release Date
2023-11-10
Revision
1.2 English

The AIE-ML array interface DMA provides direct access to external memory. The DMA is an AXI4 master, capable of issuing read and write requests to the NoC NMU interface, and hence to any AXI4 slave on the Versal device provided the NoC configuration provides the path. The DMA supports a 32-bit aligned start address. Each DMA channel generates addresses based on the base address in the buffer descriptor that stores the incremental address offset between BD calls and avoids the need to reconfigure a BD for subsequent buffer transfers.

The DMA is composed of four independent channels, two MM2S (read from external memory), and two S2MM (write to external memory). Each channel can sustain 4 bytes per cycle (4 Gbps at 1 GHz) throughput, giving a total of up to 8 Gbps read and 8 Gbps write in parallel per interface tile.

MM2S Channels (two in total) :

  • 32-bit stream master interface per channel
  • 128-bit AXI4 master read interface, shared between two channels
  • 4D tensor address generation (including iteration-offset)
  • Access shared lock module (local to interface tile)
  • Support task queue and task-complete-tokens; queue depth is four tasks per channel (see Task-Completion-Tokens for more information)

S2MM Channels (two in total)

  • 32-bit stream slave interface per channel
  • 128-bit AXI4-MM master write interface, shared between two channels
  • 4D tensor address generation (including iteration-offset)
  • Access shared lock module (local to interface tile)
  • Support task queue and task-complete-tokens; queue depth is four tasks per channel (see Task-Completion-Tokens for more information)
  • Support out-of-order packet transfer, finish-on-TLAST enabling compressed spill and restore of intermediate results to external memory

Buffer descriptors (BD):

  • 16 shared BDs

The interface DMA, together with tile and memory tile DMAs, and the streaming interconnect supports the following data-flows (non-exhaustive list).

  • Buffer copy from external-memory to memory tile
  • Buffer copy from external-memory to AIE-ML tile data memory
  • Buffer copy from memory tile to external-memory
  • Buffer copy from AIE-ML tile data memory to external-memory