AI Engine Memory - 2021.2 English

AI Engine Kernel Coding Best Practices Guide (UG1079)

Document ID

UG1079

Release Date

2021-11-10

Version

2021.2 English

Each AI Engine has 16 KB of program memory, which allows storing 1024 instructions of 128-bit each. The AI Engine instructions are 128-bit (maximum) wide and support multiple instruction formats, as well as variable length instructions to reduce the program memory size. Many instructions outside of the optimized inner loop can use the shorter formats.

Each AI Engine tile has eight data memory banks, where each memory bank (single bank) is a 256 word x 128-bit single-port memory (for a total of 32 KB). Each AI Engine can access the memory from its north, south, and east or west neighboring tiles, for a total of 128 KB data memory, including its own data memory. The stack is a subset of the data memory. The default value for stack size and heap size is 1 KB. Heap size can be automatically computed and adjusted by the compiler when optimization level is larger than zero (xlopt>=1 for aiecompiler). Stack size and heap size can be changed using compiler options or constraints in the source code. Refer to the Versal ACAP AI Engine Programming Environment User Guide (UG1076) for more information about stack and heap size usage.

In a logical representation, the 128 KB memory can be viewed as one contiguous 128 KB block or four 32 KB blocks, and each block can be divided into four odd and four even banks. One even bank and one odd bank are interleaved to comprise a double bank. AI Engines on the edges of the AI Engine array have fewer neighbors and correspondingly less memory available.

Each memory port operates in 256-bit/128-bit vector register mode or 32-bit/16-bit/8-bit scalar register mode. The 256-bit port is created by an even and odd pairing of the memory banks. The 8-bit and 16-bit stores are implemented as read-modify-write instructions. Concurrent operation of all three ports is supported if each port is accessing a different bank.

Data stored in memory is in little endian format.

Recommended: It is recommended to access data memory on a 128-bit boundary with vector operations.

Each AI Engine has a DMA controller that is divided into two separate modules, S2MM to store stream data to memory (32-bit data) and MM2S to write the contents of the memory to a stream (32-bit data). Both S2MM and MM2S have two independent data channels.