ZenDNN in WeGO PyTorch - 3.0 English

Vitis AI User Guide (UG1414)

Document ID
UG1414
Release Date
2023-02-24
Version
3.0 English

Enable ZenDNN in WeGO PyTorch

The ZenDNN is disabled by default and can be enabled through an extra option provided by WeGO-Torch's compile API:
wego_mod = wego_torch.compile(mod, wego_torch.CompileOptions( 
...
optimize_options = wego_torch.OptimizeOptions(zendnn_enable = True)) 
) 

After ZenDNN is enabled, the CPU operators (the operators not supported by DPU) in the compiled WeGO graph will be replaced with the ZenDNN operators, and they will be executed using ZenDNN kernels for acceleration.

Environment Variables

ZenDNN provides some environment variables for performance tuning purpose.
Table 1. Environment Variables
Name Description
OMP_DYNAMIC Set it explicitly with FALSE when you want to enable ZenDNN.
ZENDNN_GEMM_ALGO Default is 3. You can set [0, 1, 2, 3] to tune different GEMM ALGO path.
OMP_NUM_THREADS

Default is the number of physical cores of user system. You need to tune as per the inference thread number to achieve better performance. See tuning guidelines for more details.

Tunning Guidelines

ZenDNN uses OpenMP as the underlying library. The environment variable OMP_NUM_THREADS is used to control intra-op parallelism which is multi-core parallelism in ZenDNN kernels. For OpenMP, different application threads or inter-op threads may use different OpenMP thread pools for intra-op tasks and thus a large number of OpenMP threads might be used in a multi-thread application, which will consume lots of CPU core resources and reduce the overall performance. So, the recommended tuning OMP_NUM_THREADS value is set as per the number of cores in the target CPU platform and the thread number used in your application to avoid over-subscription. For example, if you launch 16 threads in an application and you have 64 CPU cores on your platform, then you can set OMP_NUM_THREADS <= 4 to avoid CPU cores contention.