Compute Unit Calls - Maximum

Compute Unit Calls - Maximum - 2023.2 English

Vitis Guidance Messaging (UG1315)

Document ID

UG1315

Release Date

2023-10-18

Version

2023.2 English

Description

This rule checks for compute units that are called too often, thus incurring an overhead.

Explanation

Workgroups provide a powerful mechanism to split operations across multiple compute units (CUs). However, creating a kernel with too small a work size can incur an overhead since each CU call has a startup time. AMD recommends finding the right balance of global and local work sizes to optimize overall performance.

Consider a global work size of (1024, 1024, 1); AMD does not recommend:

__kernel __attribute__ ((reqd_work_group_size(4, 4, 1)))
void madd(__global int* a, __global int* b, __global int* c) {
  int i = get_global_id(1)*get_global_size(0) + get_global_id(0);
  c[i] = a[i] + b[i];
}

In contrast, AMD recommends:

__kernel __attribute__ ((reqd_work_group_size(<b>256</b>, <b>256</b>, 1)))
void madd(__global int* a, __global int* b, __global int* c) {
  int i = get_global_id(1)*get_global_size(0) + get_global_id(0);
  c[i] = a[i] + b[i];
}