Enable ZenDNN in WeGO PyTorch
wego_mod = wego_torch.compile(mod, wego_torch.CompileOptions(
...
optimize_options = wego_torch.OptimizeOptions(zendnn_enable = True))
)
After ZenDNN is enabled, the CPU operators (the operators not supported by DPU) in the compiled WeGO graph will be replaced with the ZenDNN operators, and they will be executed using ZenDNN kernels for acceleration.
Environment Variables
Name | Description |
---|---|
OMP_DYNAMIC | Set it explicitly with FALSE when you want to enable ZenDNN. |
ZENDNN_GEMM_ALGO | Default is 3. You can set [0, 1, 2, 3] to tune different GEMM ALGO path. |
OMP_NUM_THREADS |
Default is the number of physical cores of user system. You need to tune as per the inference thread number to achieve better performance. See tuning guidelines for more details. |
Tunning Guidelines
ZenDNN uses OpenMP as the underlying library. The environment variable OMP_NUM_THREADS
is used to control intra-op parallelism
which is multi-core parallelism in ZenDNN kernels. For OpenMP, different application
threads or inter-op threads may use different OpenMP thread pools for intra-op tasks
and thus a large number of OpenMP threads might be used in a multi-thread
application, which will consume lots of CPU core resources and reduce the overall
performance. So, the recommended tuning OMP_NUM_THREADS
value is set as per the number of cores in the target CPU platform and the
thread number used in your application to avoid over-subscription. For example, if
you launch 16 threads in an application and you have 64 CPU cores on your platform,
then you can set OMP_NUM_THREADS <= 4
to avoid CPU cores
contention.