AIE-ML Memory Tile DMA

Versal ACAP AIE-ML Architecture Manual (AM020)

Document ID
Release Date
1.0 English

The list of features in the AIE-ML memory tile DMA is covered in the AIE-ML Memory Tile Overview and Features section. The memory tile DMA is similar to the AI Engine tile DMA with a few enhancements:

  • Supports 5D tensor address generation (including iteration-offset)
  • Allows out-of-order buffer descriptor (BD) processing based on incoming packet header information
  • Supports compression and decompression

The memory tile DMA has 12 independent channels, six S2MM and six MM2S. Each channel has an input task queue. It can load a BD, generate address, access memory over a shared interface, and read or write to and from its stream port. Each channel can also trigger the issuing of a task-complete-token upon completing a task. The AIE-ML memory tile DMA supports address generation as described in the AIE-ML Data Movement Architecture section. The memory tile DMA supports up to four dimensions (K=4).

Of the six S2MM and MM2S channels, DMA S2MM channels 0-3 and MM2S channels 0-3 can access the memory banks in the tile to the west and east, in addition to the local memory banks. These same channels can also access lock modules in tile to the east and west. Both MM2S and S2MM channels 4-5 can only access local memory banks and local lock modules.

All 12 channels use the same address scheme and lock indexes, as shown in the following table.

Table 1. Address and Lock Ranges for Memory Tile DMAs
  Address Ranges Lock Indexes Description
West 0x0_00000x7_FFFF 0–63 Channels 0–3 only
Local 0x8_00000xF_FFFF 64–127  
East 0x10_00000x17_FFFF 128–191 Channels 0–3 only

With this addressing scheme, it is possible to configure the hardware where the address and lock requests could be out of range for a specific DMA channel. This condition can result in the DMA channel stalling requiring a channel reset to proceed.

The memory tile MM2S channels support zero-padding insertion. This feature satisfies two application requirements:

  • Algorithmic padding: To recreate surrounding data on the edge of valid data.
  • Granularity padding: An optimized kernel can operate on 16 channels while a layer can have 24 channels, the MM2S channel pads the channel dimension up to 32 channels.

The zero-padding insertion is linked to 4D address generation. For the lower three dimensions there is a field for padding before and after that dimension. The following figure illustrates zero-padding in three dimensions. The padding on a dimension is added on top of the wrap for that dimension.

Figure 1. Zero Padding in 3D