We begin by setting the Active build configuration
to Emulation-AIE
and building the design with the default optimization level (xlopt = 1). After a successful build, right-click aie_iir_2a [aiengine]
and select Run As
-> Launch AIE Emulator
. After a successful simulation, we can now enable profiling. Right-click aie_iir_2a [aiengine]
, select Run Configurations...
, and click Generate Profile
in the Run Configurations
window.
Note: It may be necessary to increase the height of the Run Configurations
window to see the Generate Profile
section.
Click the Run
button to re-run with profiling enabled.
After the simulation completes, the “goodness” of the result can be checked by running:
$ julia check.jl aie
The result is “good” when the maximum(abs.(err))
is less than eps(Float32)
.
To view the profiler result, in the Explorer
pane, expand Emulation-AIE
and aiesimulator_output
.
Double-click default.aierun_summary
to open the report in Vitis Analyzer
.
In the Vitis Analyzer
window, click Profile
in the browser pane (leftmost pane), then Total Function Time
to show the number of cycles consumed by each function.
Note: The kernel function, SecondOrderSection<1>
was executed 32 times and ran for 2,313 cycles. Each function call consumed 2,313/32 = 72.28 cycles. The minimum function time is 72 cycles and the maximum is 81 cycles. This implies that the first call consumed nine more cycles (81 + 31 * 72 = 2,313).
Another item of interest is the top-level main
function which calls my_graph.run()
, which calls SecondOrderSection<1>
. The Total Function + Descendants Time (cycles)
column shows the number of cycles consumed by that function, including all other routines called within it. This includes setting up the heap and stack, initialization, actual processing, etc. For this implementation, 4,579 cycles were used to process 256 samples, or 4579/256 = 17.89 cycles/sample. Assuming that the AI Engine runs with a 1 GHz clock, the throughput is 1e9 cycles/sec / 17.89 cycles/sample = 55.897 Msamples/sec.
Note: The main processing occurs in SecondOrderSection<1>
, which consumes 2,313 cycles. Thus, 4,579 - 2,313 = 2,266 unavoidable “overhead” cycles are not used for sample processing.
Click Profile Details
to view the generated assembly code.