ZenDNN in WeGO PyTorch - 3.5 English

Vitis AI User Guide (UG1414)

Document ID
UG1414
Release Date
2023-09-28
Version
3.5 English

Enable ZenDNN in WeGO PyTorch

The ZenDNN is initially disabled, but you have the option to enable it through the WeGO-Torch's compile API:
wego_mod = wego_torch.compile(mod, wego_torch.CompileOptions( 
...
optimize_options = wego_torch.OptimizeOptions(zendnn_enable = True)) 
) 

After ZenDNN is enabled, the CPU operators (the operators not supported by DPU) in the compiled WeGO graph are replaced with the ZenDNN operators, and they will be executed using ZenDNN kernels for acceleration.

Environment Variables

ZenDNN provides some environment variables for performance tuning.
Table 1. Environment Variables
Name Description
OMP_DYNAMIC Set it explicitly with FALSE when you want to enable ZenDNN.
ZENDNN_GEMM_ALGO The default value is 3. You can set [0, 1, 2, 3, 4] to tune different GEMM ALGO paths.
OMP_NUM_THREADS

The default value is the number of physical cores of the user system. You need to tune per the inference thread number to achieve better performance. For more details, see tuning guidelines.

Tunning Guidelines

ZenDNN uses OpenMP as the underlying library. The OMP_NUM_THREADS environment variable controls intra-op parallelism, which is multi-core parallelism in ZenDNN kernels. For OpenMP, different application threads or inter-op threads can use different OpenMP thread pools for intra-op tasks. Thus, many OpenMP threads might be used in a multi-thread application, which will consume lots of CPU core resources and reduce the overall performance. So, the recommended tuning OMP_NUM_THREADS value is set per the number of cores in the target CPU platform and the thread number used in your application to avoid over-subscription. For example, if you launch 16 threads in an application and have 64 CPU cores on your platform, you can set OMP_NUM_THREADS <= 4 to avoid CPU cores contention.