Controlling the Application with the XRT C++ API - 2023.2 English

AI Engine Tools and Flows User Guide (UG1076)

Document ID
UG1076
Release Date
2023-12-04
Version
2023.2 English

XRT provides C and C++ APIs to control PL kernels and AI Engine graphs.

The execution model for the XRT API controlling PL kernels and AI Engine graphs is as follows:

  1. Open the device, and load XCLBIN. Get the UUID as needed.
  2. Allocate buffer objects and map-to-host memory. Process and transfer data from the host memory to the device memory.
  3. Get PL kernel handles, set arguments for kernels, and launch kernels.
  4. Get AI Engine graphs, and run graphs.
  5. Wait for the completion of the graphs.
  6. Wait for the completion of the kernels.
  7. Transfer data from the global memory in the device back to the host memory.
  8. The host code continues processing using the new data in the host memory.
Note: There are two ways to start the AI Engine graph. The AI Engine graph can be auto-started when the board is booted and runs forever after download. The package setting defer_aie_run determines this behavior. PL kernels, and AI Engine graphs can also be started in the host application and the application code can determine whether to wait for a specific kernel's completion or a specific graph's completion. This behavior can vary depending on the design.
XRT provides class graph in the name space xrt and its member functions to control the AI Engine graph.

Example code to control the AI Engine graph and PL kernels using the XRT C++ API is as follows:

// Including the xrt header files below is mandatory
#include "xrt/xrt_graph.h"
#include "xrt/xrt_kernel.h"

size_t output_size_in_bytes = OUTPUT_SIZE * sizeof(int);

// Open xclbin
auto device = xrt::device(0); //device index=0
//load the xclbin application which may contain PL kernels and AI Engine graphs
auto uuid = device.load_xclbin(xclbinFilename);

// PL control
// Get the handle to s2mm & random_noise PL kernel 
auto s2mm = xrt::kernel(device, uuid, "s2mm");
auto random_noise = xrt::kernel(device, uuid, "random_noise");
// allocate output memory for data from s2mm kernel
auto out_bo = xrt::bo(device, output_size_in_bytes,s2mm.group_id(0));
auto host_out=out_bo.map<std::complex<short>*>();
//run the s2mm and random_noise PL kernels
auto s2mm_run = s2mm(out_bo, nullptr, OUTPUT_SIZE);//start s2mm
auto random_noise_run = random_noise(nullptr, OUTPUT_SIZE);

//AI Engine Graph Control
//Initialize run time partameter data
int coeffs_readback[12];
int narrow_filter[12] = {180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504};
int wide_filter[12] = {-21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539};
//get the handle to the graph called "gr" 
auto ghdl=xrt::graph(device,uuid,"gr");
// update run time parameter in the graph
ghdl.update("gr.fir24.in[1]",narrow_filter);
//run the graph for 16 iterations
ghdl.run(16);
// wait for graph to complete running 16 iterations
ghdl.wait();
//read value from a run time parameter
ghdl.read("gr.fir24.inout[0]",coeffs_readback);//Read after graph::wait. RTP update effective
// update run time parameter in the graph
ghdl.update("gr.fir24.in[1]",wide_filter);
//run the graph for 16 iterations
ghdl.run(16);
ghdl.read("gr.fir24.inout[0]", coeffs_readback);//Async read
ghdl.wait();

// wait for the s2mm PL kernel to be done
auto state = s2mm_run.wait();
std::cout << "s2mm completed with status(" << state << ")\n";
out_bo.sync(XCL_BO_SYNC_BO_FROM_DEVICE);

//Post-processing...
...