xcl_max_work_group_size - 2021.2 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
English (United States)
Release Date
2021.2 English


Use this attribute instead of REQD_WORK_GROUP_SIZE when you need to specify a larger kernel than the 4K size.

Extends the default maximum work group size supported in the Vitis core development kit by the reqd_work_group_size attribute. Vitis core development kit supports work size larger than 4096 with the XCL_MAX_WORK_GROUP_SIZE attribute.

Note: The actual workgroup size limit is dependent on the Xilinx device selected for the platform.


Place this attribute before the kernel definition, or before the primary function specified for the kernel:

__attribute__((xcl_max_work_group_size(<X>, <Y>, <Z>)))


  • <X>, <Y>, <Z>: Specifies the ND range of the kernel. This represents each dimension of a three dimensional matrix specifying the size of the work-group for the kernel.


Below is the kernel source code for an un-optimized adder. No attributes were specified for this design, other than the work size equal to the size of the matrices (for example, 64x64). That is, iterating over an entire workgroup will fully add the input matrices, a and b, and output the result. All three are global integer pointers, which means each value in the matrices is four bytes, and is stored in off-chip DDR global memory.

#define RANK 64
__kernel __attribute__ ((reqd_work_group_size(RANK, RANK, 1)))
void madd(__global int* a, __global int* b, __global int* output) {
int index = get_local_id(1)*get_local_size(0) + get_local_id(0);
output[index] = a[index] + b[index];

This local work size of (64, 64, 1) is the same as the global work size. This setting creates a total work size of 4096.

Note: This is the largest work size that Vitis core development kit supports with the standard OpenCL attribute REQD_WORK_GROUP_SIZE. Vitis core development kit supports work size larger than 4096 with the Xilinx attribute xcl_max_work_group_size.

Any matrix larger than 64x64 would need to only use one dimension to define the work size. That is, a 128x128 matrix could be operated on by a kernel with a work size of (128, 1, 1), where each invocation operates on an entire row or column of data.