Controlling the PL Kernel with the XRT API - 2022.1 English

Versal ACAP AI Engine Programming Environment User Guide (UG1076)

Document ID
UG1076
Release Date
2022-05-25
Version
2022.1 English

Xilinx provides an OpenSource XRT API for controlling the execution of PL kernels when programming the host code for Linux.

The execution model for the XRT API controlling PL kernels is as follows:

  1. Get device handle and load the XCLBIN. Get the uuid as needed.
  2. Allocate buffer objects and map to host memory. Process and transfer data from host memory to device memory.
  3. Get kernel and run handles, set arguments for kernels, and launch kernels.
  4. Wait for kernel completion.
  5. Transfer data from global memory in the device back to host memory.
  6. Host code continues processing using the new data in the host memory.

When using the native XRT API, the host application looks like the following.

1.// Open device, load xclbin, and get uuid
    
auto dhdl = xrtDeviceOpen(0);//device index=0

xrtDeviceLoadXclbinFile(dhdl,xclbinFilename);
xuid_t uuid;
xrtDeviceGetXclbinUUID(dhdl, uuid);

2. Allocate output buffer objects and map to host memory

xrtBufferHandle out_bohdl = xrtBOAlloc(dhdl, output_size_in_bytes, 0, /*BANK=*/0);
std::complex<short> *host_out = (std::complex<short>*)xrtBOMap(out_bohdl);

3. Get kernel and run handles, set arguments for kernel, and launch kernel.
xrtKernelHandle s2mm_khdl = xrtPLKernelOpen(dhdl, top->m_header.uuid, "s2mm"); // Open kernel handle
xrtRunHandle s2mm_rhdl = xrtRunOpen(s2mm_khdl); 
xrtRunSetArg(s2mm_rhdl, 0, out_bohdl); // set kernel arg
xrtRunSetArg(s2mm_rhdl, 2, OUTPUT_SIZE); // set kernel arg
xrtRunStart(s2mm_rhdl); //launch s2mm kernel

// ADF API:run, update graph parameters (RTP) and so on
……

4. Wait for kernel completion.
auto state = xrtRunWait(s2mm_rhdl);

5. Sync output device buffer objects to host memory.

xrtBOSync(out_bohdl, XCL_BO_SYNC_BO_FROM_DEVICE , output_size_in_bytes,/*OFFSET=*/ 0);

//6. post-processing on host memory - "host_out"

After post-processing the data, release the allocated objects:

graph.end();
xrtRunClose(s2mm_rhdl);
xrtKernelClose(s2mm_khdl);

xrtBOFree(out_bohdl);
xrtDeviceClose(dhdl);
Important: After graph.end(), the AI Engine kernels will not recover again. The graph.end() API waits for the termination of the graph. A graph is considered to be terminated when all its active processors exit their main thread and disable themselves. This is a blocking operation for the PS application. graph.end() also cleans up the state of the graph, such as forcing the release of all locks and cleaning up the stream switch configurations used in the graph. To run the graph multiple times, replace graph.end() with graph.wait().