Conclusions - 2023.2 English

Vitis Tutorials: AI Engine (XD100)

Document ID

XD100

Release Date

2024-03-05

Version

2023.2 English

The AI Engine API is intended to improve productivity by increasing the level of abstraction relative to the low-level intrinsics. Fig. 6 We recommend using the AI Engine API and only low-level intrinsics to achieve more performance to meet target specifications.

Throughput may be improved using the following techniques:

Reduce function call overhead by processing as many samples within the function as possible.
For floating-point accumulation, use two accumulators with low-level intrinsics.

Can the throughput be improved even further?

Floating-point allows 8 MACs per cycle. Using 32-bit fixed-point coefficients with 16-bit data allow 16 MACs per cycle, potentially doubling the throughput. 16-bit fixed-point coefficients with 16-bit data allow 32 MACs per cycle, potentially quadrupling the throughput. 16-bit fixed-point coefficients with 8-bit data allow 64 MACs per cycle, potentially improving the throughput by 8x.
Assuming that we stick with a floating-point implementation, doubling the number of processed samples and the number of AI Engines (That is, two AI Engines, each processing eight samples from a 16-sample window) may double the throughput.