Controlling PL Kernels with the OpenCL API - 2020.2 English

Versal ACAP AI Engine Programming Environment User Guide (UG1076)

Document ID
UG1076
Release Date
2020-11-24
Version
2020.2 English

The host code main() function includes OpenCL or Xilinx Runtime (XRT) APIs to control the execution of PL kernels, as well as ADF APIs to control the AI Engine graph (init(), update(), run(), wait()).

To load and control PL kernels from the host application, the execution model contains following steps:

  1. Get the OpenCL platform and device, prepare a context and command queue. Program the XCLBIN file and get kernel objects from the program.adf::registerXRT() is still needed, but the device handle can be converted from the XCL domain to XRT domain.
  2. Prepare device buffers for the kernels. Transfer data from host memory to global memory in device.
  3. The host program sets up the kernel with its input parameters and triggers the execution of the kernel on the Versal™ device.
  4. Wait for kernel completion.
  5. Transfer data from global memory in the device back to host memory.
  6. Host code continues processing using the new data in the host memory.
Tip: Refer to Developing Applications in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416) for more information on coding the host application for controlling PL kernels.

The following is a code snippet from an example host.cpp to illustrate the prior steps:

//1. Get OpenCL platform and device, prepare context and command queue. 
cl::Device device;
std::vector<cl::Platform> platforms;
cl::Platform::get(&platforms);
cl::Context context(device);
cl::CommandQueue q(context, device, CL_QUEUE_PROFILING_ENABLE | CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE);

//Program xclbin, and get kernel objects from the program.
cl::Program::Binaries bins;
cl::Program program(context, devices, bins);
cl::Kernel krnl_s2mm(program,"s2mm"); //get kernel object

// Create XRT device handle for ADF API
void *dh;
device.getInfo(CL_DEVICE_HANDLE, &dh);
auto dhdl = xrtDeviceOpenFromXcl(dh);
xuid_t uuid;
xrtDeviceGetXclbinUUID(dhdl, uuid);
adf::registerXRT(dhdl, uuid);

//2. Prepare device buffers for kernels. Transfer data from host memory to global memory in device. 
std::complex<short> *host_out; //host buffer
cl::Buffer buffer_out(context, CL_MEM_WRITE_ONLY, output_size_in_bytes);
host_out=(std::complex<short>*)q.enqueueMapBuffer(buffer_out,true,CL_MAP_READ,0,sizeof(int)*OUTPUT_SIZE,nullptr,nullptr,nullptr);

//3. Set up kernel input parameters
krnl_s2mm.setArg(0,buffer_out);
krnl_s2mm.setArg(2,OUTPUT_SIZE);

//Launch the Kernel
q.enqueueTask(krnl_s2mm);

// ADF API: Initialize, run and update graph parameters (RTP)
gr.run(4);
gr.update(gr.trigger,10);
gr.update(gr.trigger,10);
gr.update(gr.trigger,100);
gr.update(gr.trigger,100);
gr.wait();

//4. Wait for kernel completion. 
q.finish();//Wait for s2mm to complete    

//5. Transfer data from global memory back to host memory.
q.enqueueMigrateMemObjects({buffer_out},CL_MIGRATE_MEM_OBJECT_HOST);	
q.finish();//Wait for memory transfer to complete

//6. Continue processing on host memory
Important: The q.finish() function for the command queue is blocking. Before calling this function, start dependent tasks such as graph execution.