AIE-ML Array Features

Versal Adaptive SoC AIE-ML Architecture Manual (AM020)

Document ID
AM020
Release Date
2023-11-10
Revision
1.2 English

AMD developed multiple iterations of AI Engines. This architecture manual details the specifics of the AIE-ML.

Some Versal adaptive SoCs include the AIE-ML that consists of an array of AIE-ML tiles, AIE-ML memory tiles, and the AIE-ML array interface consisting of the network on chip (NoC) and programmable logic (PL) tiles. The following lists the features of each. A pictorial view of the array organization is shown in Figure 1.

AIE-ML Tile Features

  • A separate building block, integrated into the silicon, outside the programmable logic (PL).
  • One AIE-ML incorporates a high-performance very-long instruction word (VLIW) single-instruction multiple-data (SIMD) vector processor optimized for many applications including machine learning applications.
  • From a hardware perspective, data memory is 64 KB organized as eight banks of 8 KB. From a programmer's perspective, every two banks are interleaved to form one bank, that is, a total of four banks of 16 KB each.
  • Streaming interconnect for deterministic throughput, high-speed data flow between AIE-ML tiles and/or the programmable logic in the Versal device.
  • Direct memory access (DMA) in the AIE-ML tile moves data from incoming stream(s) to local memory and from local memory to outgoing stream(s).
  • Configuration interconnect (through memory-mapped AXI4 interface) with a shared, transaction-based switched interconnect for access from external masters to internal AIE-ML tile.
  • Hardware synchronization primitives provide synchronization of the AIE-ML, between the AIE-ML and the tile DMA, and between the AIE-ML and an external master (through the memory-mapped AXI4 interface).
  • Debug, trace, and profile functionality.
  • The AIE-ML tile has additional granularity on clock gating and reset. Clock gating and reset of the AIE-ML tile can be done via the memory-mapped AXI4 register inside the tile. In AIE-ML the memory-mapped AXI4, clock gating, and reset registers are moved into an always-on domain to give modular control to core, stream-switch and memory module in the tile. A similar arrangement also applies to the memory tile, a functional unit in the AIE-ML that is introduced in the following section.

AIE-ML Memory Tile Features

  • A tile containing 512 KB of high-density, high-bandwidth memory to reduce the use of PL resources in machine learning (ML) applications.
  • The memory tile DMA has the same channel features as the AIE-ML tile with the exception that the memory tile DMA also supports 4-D addressing modes. See AIE-ML Memory Tile Architecture for a more detailed description.
  • AXI4-Stream interconnect is the same as AIE-ML tile except the number of ports and connectivity are different.
  • Memory-mapped AXI4 configuration is the same as the AIE-ML tile.

AIE-ML Array Interface to NoC and PL Resources

  • Direct memory access (DMA) in the AIE-ML NoC interface tile manages incoming and outgoing memory-mapped and streams traffic into and out of the AIE-ML array. The interface tile is described in AIE-ML Array Interface Architecture.
  • Configuration and control interconnect functionality (through the memory-mapped AXI4 interface)
  • Streaming interconnect that leverages the AIE-ML tile streaming interconnect functionality.
  • AIE-ML to programmable logic (PL) interface that provides asynchronous clock-domain crossing between the AIE-ML clock and the PL clock.
  • AIE-ML to NoC interface logic to the NoC master unit (NMU) and NoC slave unit (NSU) components.
  • Hardware synchronization primitives leverage features from the AIE-ML tile locks module.
  • Debug, trace, and profile functionality that leverage all the features from the AIE-ML tile.
  • For a list of changes from the AI Engine (AIE) array interface, see AIE-ML Array Interface Architecture.