Throughput - 2023.2 English

Vitis Tutorials: AI Engine (XD100)

Document ID
XD100
Release Date
2024-03-05
Version
2023.2 English

The burst_cnt variable determines the number of samples processed during each function call. The inner loop processes eight samples per iteration, so the total number of processed samples is burst_cnt * 8.

The throughput is obtained as follows (see api_thruput.xlsx):

  • Build and run the design.

  • Open aiesimulator_output/default.aierun_summary.

  • Get the Total Function + Descendants Time (cycles) for the main function (num_cycles).

  • Throughput = clk_freq (burst_cnt 8)/num_cycles.

The throughput with a 1 GHz clock for different values of burst_cnt are as follows:

IIR Throughput (with API) | | | | | | | | | |—————————|——-|——-|——-|——-|——-|——-|——-| |burst_cnt |1 |8 |16 |32 |64 |128 |256 | |num_samples |8 |64 |128 |256 |512 |1024 |2048 | |num_cycles (API) |187 |492 |940 |1836 |3628 |7212 |14379 | |API Throughput (Msa/sec) |42.78 |130.08 |136.17 |139.43 |141.12 |141.99 |142.43 |

*clk_freq: 1 GHz

The AI Engine APIs are a header-only implementation that acts as a “buffer” between the user and the low-level intrinsics (LLI) to increase the level of abstraction.

We modify the kernel code to use low-level intrinsics (LLI).