Compute Unit Calls - Maximum - 2023.2 English

Vitis Guidance Messaging (UG1315)

Document ID
UG1315
Release Date
2023-10-18
Version
2023.2 English

Description

This rule checks for compute units that are called too often, thus incurring an overhead.

Explanation

Workgroups provide a powerful mechanism to split operations across multiple compute units (CUs). However, creating a kernel with too small a work size can incur an overhead since each CU call has a startup time. AMD recommends finding the right balance of global and local work sizes to optimize overall performance.

Consider a global work size of (1024, 1024, 1); AMD does not recommend:
__kernel __attribute__ ((reqd_work_group_size(4, 4, 1)))
void madd(__global int* a, __global int* b, __global int* c) {
  int i = get_global_id(1)*get_global_size(0) + get_global_id(0);
  c[i] = a[i] + b[i];
}
In contrast, AMD recommends:
__kernel __attribute__ ((reqd_work_group_size(<b>256</b>, <b>256</b>, 1)))
void madd(__global int* a, __global int* b, __global int* c) {
  int i = get_global_id(1)*get_global_size(0) + get_global_id(0);
  c[i] = a[i] + b[i];
}