Kernel Handle and Compute Units - 2023.2 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2023-12-13
Version
2023.2 English

The first time clSetKernelArg is called for a given kernel object, XRT identifies the group of symmetrical CUs for subsequent executions of the kernel. When clEnqueueTask is called for that kernel, any of the symmetrical CUs in that group can be used to process the task.

If all CUs for a given kernel are symmetrical, a single kernel object is sufficient to access any of the CUs. However, if there are asymmetrical CUs, the host application will need to create a unique kernel object for each group of asymmetrical CUs. In this case, the call to clEnqueueTask must specify the kernel object to use for the task, and any matching CU for that kernel can be used by XRT.

Creating Kernel Objects for Specific Compute Units

For creating kernels associated with specific compute units, the clCreateKernel command supports specifying the CUs at the time the kernel object is created by the host program. The syntax of this command is shown below:

// Create kernel object only for a specific compute unit 
cl_kernel kernelA = clCreateKernel(program,"<kernel_name>:{compute_unit_name}",&err);
// Create a kernel object for two specific compute units 
cl_kernel kernelB = clCreateKernel(program, "<kernel_name>:{CU1,CU2}", &err);
Important: As discussed in Creating Multiple Instances of a Kernel, the number of CUs is specified by the connectivity.nk option in a config file used by the v++ command during linking. Therefore, whatever is specified in the host program, to create or enqueue kernel objects, must match the options specified by the config file used during linking.

In this case, the Xilinx Runtime identifies the kernel handles (kernelA, kernelB) for specific CUs, or group of CUs, when the kernel is created. This lets you control which kernel configuration, or specific CU instance is used, when using clEnqueueTask from within the host program. This can be useful in the case of asymmetrical CUs, or to perform load and priority management of CUs.