Array Interface DMA Memory-Mapped AXI4 Master Interface

Versal ACAP AIE-ML Architecture Manual (AM020)

Document ID
AM020
Release Date
2022-09-28
Revision
1.0 English

The AIE-ML array interface DMA provides direct access to external memory. The DMA is an AXI4 master, capable of issuing read and write requests to the NoC NMU interface, and hence to any AXI4 slave on the Versal device provided the NoC configuration provides the path.

The DMA is composed of four independent channels, two MM2S (read from external memory), and two S2MM (write to external memory). Each channel can sustain four bytes per cycle (4 GB/s at 1 GHz) throughput, giving a total of up to eight GB/s read, and eight GB/s write in parallel per interface tile.

MM2S Channels (two in total) :

  • 32-bit stream master interface per channel
  • 128-bit AXI4 master read interface, shared between two channels
  • 4D tensor address generation (including iteration-offset)
  • Access shared lock module (local to Interface tile)
  • Support task queue and task-complete tokens; queue depth is four tasks per channel

S2MM Channels (two in total):

  • 32-bit stream slave interface per channel
  • 128-bit AXI4-MM master write interface, shared between two channels
  • 4D tensor address generation (including iteration-offset)
  • Access shared lock module (local to interface tile)
  • Support task queue and task-complete tokens; queue depth is four tasks per channel
  • Support out-of-order packet transfer, finish-on-TLAST

Buffer descriptors (BD):

  • 16 shared BDs

Lock module:

  • 16 semaphore locks, each lock state is 6-bit unsigned

The interface DMA, together with tile and memory tile DMAs, and the streaming interconnect supports the following data-flows (non-exhaustive list).

  • Buffer copy from external-memory to memory tile
  • Buffer copy from external-memory to AIE-ML tile data memory
  • Buffer copy from memory tile to external-memory
  • Buffer copy from AIE-ML tile data memory to external-memory