Features of the AIE-ML Array Interface

Versal ACAP AIE-ML Architecture Manual (AM020)

Document ID
AM020
Release Date
2022-09-28
Revision
1.0 English
Memory Mapped AXI4 Interconnect
Provides functionality to transfer the incoming memory-mapped AXI4 requests from the NoC to inside the AIE-ML array.
AXI4 Master: Interface-DMA
Memory mapped access to the rest of the device via the NoC, including external memory.
AXI4-Stream Interconnect
Leverages the AIE-ML tile streaming interconnect functionality.
AIE-ML to PL Interface
The AIE-ML PL modules directly communicate with the PL. Asynchronous FIFOs are provided to handle clock domain crossing.
AIE-ML to NoC Interface
The AIE-ML to NoC module handles the conversion of 128-bit NoC streams into 32-bit AIE-ML streams (and vice versa). It provides the interface logic to the NoC components (NMU and NSU). Level shifting is performed because the NMU and NSU are in a different power domain from the AIE-ML.
Hardware Locks
Leverages the corresponding unit in the AIE-ML tile and is accessible from the AIE-ML array interface or an external memory-mapped AXI4 master, the module is used to synchronize the array interface to DMA transfer to/from external memory.
Debug, Trace, and Profile
Leverages all the features from the AIE-ML tile for local event debugging, tracing, and profiling.

The following is a list of changes from the AI Engine to the AIE-ML array interface features:

  • AIE-ML array interface DMA
    • Supports 32-bit aligned start address
    • Supports 4D tensor address generation (including iteration-offset)
    • Each DMA channel generates addresses based off the base-address in the buffer descriptor (BD) that stores the incremental address offset between BD calls and avoids the need to reconfigure a BD for subsequent buffer transfers
    • Task queue and task-complete-tokens
    • Support for S2MM out-of-order and finish-on-TLAST features (enabling compressed spill and restore of intermediate results to external memory)
  • Lock design with 16 semaphore locks and 6-bit unsigned lock state
  • One stream FIFO
  • Additional control and status registers
  • Memory-mapped AXI4 interface to support 1 MB address space per tile and write bandwidth improvement