AIE-ML Processor

Versal Adaptive SoC AIE-ML Architecture Manual (AM020)

Document ID
AM020
Release Date
2023-11-10
Revision
1.2 English

Similar to AIE, the AI Engine processor in AIE-ML consists of a scalar 32-bit data path, a SIMD vector data path, two load units, and a store unit, and is optimized for ML applications.

The following provides a list of AIE-ML processor features:

  • Instruction-based VLIW SIMD processor with new instructions
  • Same 16 KB program memory as in AIE
  • Vector unit supports 256 (8b x 8b) and 512 (4b x 8b) MAC operations
  • Vector unit supports 128 bloat16 MAC operations with FP32 accumulation
  • Vector unit supports structure sparsity and FFT processing for ML inference applications, including cint32 x cint16 multiplication (data in cint32 and twiddle factor is cint16), control support for complex and conjugation, new permute mode, and shuffle mode. See Sparsity for more information.
  • A new processor bus that allows the processor to access memory mapped registers in the local AIE-ML tile
  • The complex circular addressing modes are dropped and replaced by a 3D addressing mode
  • On-the-fly decompression during loading of sparse weights. See Sparsity for more information.

The AIE-ML processor removes some advanced DSP functionality used in the AIE processor including:

  • 32-bit floating-point vector data path is not directly supported but can be emulated via decomposition into multiple multiplications of 16 x 16-bit
  • Scalar non-linear functions, including sin/cos, sqrt, inverse sqrt and inverse
  • Scalar floating point/integer conversions
  • Complex circular addressing and FFT addressing modes. However, some level of FFT and complex support is provided; see the AIE-ML processor features.
  • Limited support 128-bit load/store
  • Non-aligned memory access
  • Support for some complex data-types; some level of complex support is provided, see the AIE-ML processor features
  • Native support for 32 × 32 multiplication but can be emulated using 16-bit integer operands
  • Removal of non-blocking 128-bit stream interfaces and stream FIFOs
  • Control streams and packet header generations