Xilinx provides an OpenSource XRT API for controlling the execution of PL kernels when programming the host code for Linux.
The execution model for the XRT API controlling PL kernels is as follows:
- Get device handle and load the XCLBIN. Get the
- Allocate buffer objects and map to host memory. Process and transfer data from host memory to device memory.
- Get kernel and run handles, set arguments for kernels, and launch kernels.
- Wait for kernel completion.
- Transfer data from global memory in the device back to host memory.
- Host code continues processing using the new data in the host memory.
When using the native XRT API, the host application looks like the following.
1.// Open device, load xclbin, and get uuid auto dhdl = xrtDeviceOpen(0);//device index=0 xrtDeviceLoadXclbinFile(dhdl,xclbinFilename); xuid_t uuid; xrtDeviceGetXclbinUUID(dhdl, uuid); 2. Allocate output buffer objects and map to host memory xrtBufferHandle out_bohdl = xrtBOAlloc(dhdl, output_size_in_bytes, 0, /*BANK=*/0); std::complex<short> *host_out = (std::complex<short>*)xrtBOMap(out_bohdl); 3. Get kernel and run handles, set arguments for kernel, and launch kernel. xrtKernelHandle s2mm_khdl = xrtPLKernelOpen(dhdl, top->m_header.uuid, "s2mm"); // Open kernel handle xrtRunHandle s2mm_rhdl = xrtRunOpen(s2mm_khdl); xrtRunSetArg(s2mm_rhdl, 0, out_bohdl); // set kernel arg xrtRunSetArg(s2mm_rhdl, 2, OUTPUT_SIZE); // set kernel arg xrtRunStart(s2mm_rhdl); //launch s2mm kernel // ADF API:run, update graph parameters (RTP) and so on …… 4. Wait for kernel completion. auto state = xrtRunWait(s2mm_rhdl); 5. Sync output device buffer objects to host memory. xrtBOSync(out_bohdl, XCL_BO_SYNC_BO_FROM_DEVICE , output_size_in_bytes,/*OFFSET=*/ 0); //6. post-processing on host memory - "host_out"
After post-processing the data, release the allocated objects:
graph.end(); xrtRunClose(s2mm_rhdl); xrtKernelClose(s2mm_khdl); xrtBOFree(out_bohdl); xrtDeviceClose(dhdl);
graph.end(), the AI Engine kernels will not recover again. The
graph.end()API waits for the termination of the graph. A graph is considered to be terminated when all its active processors exit their main thread and disable themselves. This is a blocking operation for the PS application.
graph.end()also cleans up the state of the graph, such as forcing the release of all locks and cleaning up the stream switch configurations used in the graph. To run the graph multiple times, replace