Supported OPs and DPU Limitations - 1.3 English

Vitis AI User Guide (UG1414)

Document ID
UG1414
Release Date
2021-02-03
Version
1.3 English

Currently Supported Operators

Xilinx is continuously improving the DPU IP and the compiler to support more operators with better performance. The following table lists some typical operations and the configurations such as kernel size, stride, etc. that the DPU can support. If the operation configurations exceed these limitations, the operator will be assigned to the CPU. Additionally, the operators that the DPU can support are dependent on the DPU types, ISA versions, and configurations.

In order to make DPU adaptable to a variety of FPGA devices, some kinds of DPU are configurable. You can choose necessary engines, adjust some intrinsic parameters and create your own DPU IP with TRD projects. But that means the limitations can be very different between configurations. You can find more information about how will those options impact on the limitations in PG338. Or it is recommended that you could try compiling the model with your own DPU configuration. The compiler will tell you which operators would be assigned to CPU and why they would be so. The table shows a specific configuration of each DPU architeciture.

Table 1. Currently Supported Operators
Typical Operation Type in CNN Parameters DPUCZDX8G_ISA0_B4096_MAX_BG2 (ZCU102/104) DPUCAHX8L_ISA0 (U280) DPUCAHX8H_ISA2 (U50LV9E, U50LV10E, U280), DPUCAHX8H_ISA2_ELP2 (U50) DPUCVDX8G_ISA0_B8192C32B3 (VCK190) DPUCVDX8H_ISA0 (VCK5000)
Intrinsic Parameter channel_parallel: 16

bank_depth: 2048

channel_parallel: 32

bank_depth: 4096

channel_parallel: 16

bank_depth: 2048

channel_parallel: 16

bank_depth: 16384

channel_parallel: 64

bank_depth: 256

conv2d Kernel size w, h: [1, 16] w, h: [1, 16] w, h: [1, 16] w, h: [1, 16]

w * h <= 64

w, h: [1, 16]
Strides w, h: [1, 8] w, h: [1, 4] w, h: [1, 4] w, h: [1, 4] w, h: [1, 4]
Dilation dilation * input_channel <= 256 * channel_parallel
Paddings pad_left, pad_right: [0, (kernel_w - 1) * dilation_w + 1]
pad_top, pad_bottom: [0, (kernel_h - 1) * dilation_h + 1]
In Size kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= bank_depth
Out Size output_channel <= 256 * channel_parallel
Activation ReLU, LeakyReLU, ReLU6 ReLU, ReLU6 ReLU, LeakyReLU, ReLU6 ReLU, LeakyReLU, ReLU6 ReLU, LeakyReLU
Group* (Caffe) group==1
depthwise-conv2d Kernel size w, h: [1, 16] w, h: [3] Not supported
Strides w, h: [1, 8] w, h: [1, 2]
dilation dilation * input_channel <= 256 * channel_parallel
Paddings pad_left, pad_right: [0, (kernel_w - 1) * dilation_w + 1]
pad_top, pad_bottom: [0, (kernel_h - 1) * dilation_h + 1]
In Size kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= bank_depth
Out Size output_channel <= 256 * channel_parallel
Activation ReLU, ReLU6 ReLU, ReLU6
Group* (Caffe) group==input_channel
transposed-conv2d Kernel size kernel_w/stride_w, kernel_h/stride_h: [1, 16]
Strides
Paddings pad_left, pad_right: [1, kernel_w-1]
pad_top, pad_bottom: [1, kernel_h-1]
Out Size output_channel <= 256 * channel_parallel
Activation ReLU, LeakyReLU, ReLU6 ReLU, ReLU6 ReLU, LeakyReLU, ReLU6 ReLU, LeakyReLU, ReLU6 ReLU, LeakyReLU
depthwise-transposed-conv2d Kernel size kernel_w/stride_w, kernel_h/stride_h: [1, 16] kernel_w/stride_w, kernel_h/stride_h: [3] Not supported
Strides
Paddings pad_left, pad_right: [1, kernel_w-1]
pad_top, pad_bottom: [1, kernel_h-1]
Out Size output_channel <= 256 * channel_parallel
Activation ReLU, ReLU6 ReLU, ReLU6
max-pooling Kernel size w, h: [2, 8] w, h: {2, 3, 5, 7, 8} w, h: [1, 8] w, h: [2, 8] w, h: {1, 2, 3, 7}
Strides w, h: [1, 8] w, h: [1, 8] w, h: [1, 8] w, h: [1, 4] w, h: [1, 8]
Paddings pad_left, pad_right: [1, kernel_w-1]
pad_top, pad_bottom: [1, kernel_h-1]
Activation ReLU not supported ReLU ReLU not supported
average-pooling Kernel size w, h: [2, 8]

w==h

w, h: {2, 3, 5, 7, 8}

w==h

w, h: [1, 8]

w==h

w, h: [2, 8]

w==h

w, h: {1, 2, 3, 7}

w==h

Strides w, h: [1, 8] w, h: [1, 8] w, h: [1, 8] w, h: [1, 4] w, h: [1, 8]
Paddings pad_left, pad_right: [1, kernel_w-1]
pad_top, pad_bottom: [1, kernel_h-1]
Activation ReLU not support ReLU ReLU not support
eltwise-sum Input Channel input_channel <= 256 * channel_parallel
Activation ReLU ReLU ReLU ReLU ReLU
concat Network-specific limitation, which relates to the size of feature maps, quantization results and compiler optimizations.
reorg Strides reverse==false : stride ^ 2 * input_channel <= 256 * channel_parallel

reverse==true : input_channel <= 256 * channel_parallel

pad In Size input_channel <= 256 * channel_parallel
Mode "SYMMETRIC" ("CONSTANT" pad would be fused into adjacent operators during compiler optimization process)
global pooling Global pooling will be processed as general pooling with kernel size euqal to input tensor size.
InnerProduct, Fully Connected, Matmul These ops will be transformed into conv2d op with kernel size equal to 1x1

The following operators are primitively defined in different deep learning frameworks. The compiler can automatically parse these operators, transform them into the XIR format, and distribute them to DPU or CPU. These operators are partially supported by the tools, and they are listed here for your reference.