The burst_cnt
variable determines the number of samples processed during each function call. The inner loop processes eight samples per iteration, so the total number of processed samples is burst_cnt
* 8.
The throughput is obtained as follows (see api_thruput.xlsx
):
Build and run the design.
Open
aiesimulator_output/default.aierun_summary
.Get the
Total Function + Descendants Time (cycles)
for themain
function (num_cycles
).Throughput =
clk_freq
(burst_cnt
8)/num_cycles.
The throughput with a 1 GHz clock for different values of burst_cnt
are as follows:
IIR Throughput (with API) | | | | | | | | | |—————————|——-|——-|——-|——-|——-|——-|——-| |burst_cnt |1 |8 |16 |32 |64 |128 |256 | |num_samples |8 |64 |128 |256 |512 |1024 |2048 | |num_cycles (API) |187 |492 |940 |1836 |3628 |7212 |14379 | |API Throughput (Msa/sec) |42.78 |130.08 |136.17 |139.43 |141.12 |141.99 |142.43 |
*clk_freq: 1 GHz
The AI Engine APIs are a header-only implementation that acts as a “buffer” between the user and the low-level intrinsics (LLI) to increase the level of abstraction.
We modify the kernel code to use low-level intrinsics (LLI).