Stage 3: Generate the Code and Perform Emulation-AI Engine - 2023.1 English

Vitis Tutorials: AI Engine Development

Document ID
Release Date
2023.1 English

In this stage, you will generate the graph code of this design and perform bit-true and cycle true simulations with the AI Engine Simulator.

  1. Select the four AIE FIR Filters and the Frequency shifting block, and type CTRL+G to group them in a subsystem. Assign a new name: FIRchain.

  2. Click the canvas, and type model co. Select the Vitis Model Composer Hub block.

  3. Double-click the block Model Composer Hub, select the FIRchain subsystem, and set the following parameters on the AIE Settings tab:

    • Check Create testbench.

    • Check Run cycle approximate AIE Simulation after code generation.

    • Check Plot AIE Simulation Output and Estimate Throughput.

    • Check Collect Data for Vitis Analyzer.

  4. Click Apply.

  5. Click Generate.

The Simulink design is run to generate the testbench, then the graph code is generated and compiled. The source code can be viewed in ./code/src_aie/FIRchain.h:

#ifndef __XMC_FIRCHAIN_H__
#define __XMC_FIRCHAIN_H__

#include <adf.h>
#include "./FIR_Halfband_Decimator_b6bb9f39/FIR_Halfband_Decimator_b6bb9f39.h"
#include "./FIR_Halfband_Decimator_c797d059/FIR_Halfband_Decimator_c797d059.h"
#include "./FIR_Halfband_Decimator_714ce49a/FIR_Halfband_Decimator_714ce49a.h"
#include "./FIR_Symmetric_00c44acd/FIR_Symmetric_00c44acd.h"
#include "aiecode_src/FreqShift.h"

class FIRchain_base : public adf::graph {
   FIR_Halfband_Decimator_b6bb9f39 FIR_Halfband_Decimator;
   FIR_Halfband_Decimator_c797d059 FIR_Halfband_Decimator1;
   FIR_Halfband_Decimator_714ce49a FIR_Halfband_Decimator2;
   FIR_Symmetric_00c44acd FIR_Symmetric;
   adf::kernel FreqShift_0;

   adf::input_port In1;
   adf::output_port Out1;

   FIRchain_base() {
      // create kernel FreqShift_0
      FreqShift_0 = adf::kernel::create(FreqShift<256>);
      adf::source(FreqShift_0) = "aiecode_src/FreqShift.cpp";

      // create kernel constraints FreqShift_0
      adf::runtime<ratio>( FreqShift_0 ) = 0.9;

      // create nets to specify connections
      adf::connect<  > net0 (In1,;
      adf::connect<  > net1 (FIR_Halfband_Decimator.out,;
      adf::connect<  > net2 (FIR_Halfband_Decimator1.out,;
      adf::connect<  > net3 (FIR_Halfband_Decimator2.out,;
      adf::connect< adf::window<1024> > net4 (FIR_Symmetric.out,[0]);
      adf::connect< adf::window<1024> > net5 (FreqShift_0.out[0], Out1);

class FIRchain : public adf::graph {
   FIRchain_base mygraph;

   adf::input_plio In1;
   adf::output_plio Out1;

   FIRchain() {
      In1 = adf::input_plio::create("In1",

      Out1 = adf::output_plio::create("Out1",

      adf::connect< > (In1.out[0], mygraph.In1);
      adf::connect< > (mygraph.Out1,[0]);

#endif // __XMC_FIRCHAIN_H__

Finally, the bit-exact simulation (Emulation-AIE) is performed and the result compared to the Simulink simulation:

missing image

Vitis Analyzer is then launched. From here you can see the Graph View, the Array View, the Timeline, and the Profile information.

missing image

missing image

The Simulation Data Inspector opens and shows the output of the AI Engine. The AI Engine’s throughput is calculated by counting the number of output data points and dividing by the time. In this case, three frames are received, but only two interframe idle time are taken into account. To obtain a more accurate throughput estimate, you can use data cursors to select a specific time region over which to calculate throughput:

  1. Select the Out1 signal from the list on the left.

  2. Right-click on the plot, and select Data Cursors->Two.

  3. Position the cursors at the beginning of the first and third signal frames, as shown below.

    missing image

Here the estimated throughput is 28 MSPS instead of the expected 125 MSPS. You can use Vitis Analyzer to track the reason of this throughput reduction. Here it is very easy to see that the input stream feeds the data @250 MSPS instead of the 1000 MSPS that were expected in the design. The reason is that the input bitwidth is 32 bits at a rate of 250MHz (default value) as can be seen at the end of the FIRchain.h file.