Currently Supported Operators - 3.5 English

Vitis AI User Guide (UG1414)

Document ID
UG1414
Release Date
2023-09-28
Version
3.5 English
Table 1. Currently Supported Operators
Typical Operation Type in CNN Parameters DPUCZDX8G_ISA1_B4096 3 (ZCU102, ZCU104) DPUCAHX8L_ISA0 (U50, U50LV, U280) DPUCVDX8G_ISA3_C32B3 4 (VCK190) DPUCAHX8H_ISA2_DWC 1 (U50, U55C, U50LV, U280) DPUCADF8H_ISA0 (U200, U250)  DPUCVDX8H_ISA1_F2W4_4PE 2 (VCK5000) DPUCV2DX8G_ISA1_C20B1 5 (VEK280/V70)
Intrinsic Parameter channel_parallel: 16

bank_depth: 2048

bank_num: 8

channel_parallel: 32

bank_depth: 4096

channel_parallel: 16

bank_depth: 8192

bank_num: 8

channel_parallel: 16

bank_depth: 2048

channel_parallel: 16

bank_depth: 8192

channel_parallel: 64

bank_depth: 2048

channel_parallel: 32

bank_depth: 65528

bank_num: 1

conv2d Kernel size w, h: [1, 16] w, h: [1, 16] w, h: [1, 16]

w * h * ceil(input_channel/2048) <= 64

w, h: [1, 16] w, h: [1, 16] w, h: [1, 16] w, h: [1, 16]

256 * h * w <= 13760

Strides w, h: [1, 8] w, h: [1, 4] w, h: [1, 8] w, h: [1, 4] w, h: [1, 8] w, h: [1, 4] w, h: [1, 8]
Dilation dilation * input_channel <= 256 * channel_parallel
Paddings pad_left, pad_right: [0, (kernel_w - 1) * dilation_w]
pad_top, pad_bottom: [0, (kernel_h - 1) * dilation_h]
In Size kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= bank_depth kernel_w * kernel_h * ceil(input_channel / channel_parallel) * ceil(channel_parallel / 4) + 4 <= bank_depth
input_channel <= 256 * channel_parallel   input_channel <= 256 * channel_parallel       input_channel <= 256 * channel_parallel
Out Size output_channel <= 256 * channel_parallel
Activation ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid ReLU, ReLU6 ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid ReLU, LeakyReLU, ReLU6 ReLU, LeakyReLU ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid
Group* (Caffe) group==1  
depthwise-conv2d Kernel size w, h: [1, 256] w, h: [3] w, h: [1, 256] w, h: {1, 2, 3, 5, 7} Not supported w, h: [1, 8] w, h: [1, 256]

h * w <= 431

Strides w, h: [1, 256] w, h: [1, 2] w, h: [1, 256] w, h: [1, 4] w, h : [1, 4] w, h: [1, 256]
dilation dilation * input_channel <= 256 * channel_parallel dilation * input_channel <= 256 * channel_parallel
Paddings pad_left, pad_right: [0, min((kernel_w - 1), 15) * dilation_w] pad_left, pad_right: [0, (kernel_w - 1) * dilation_w] pad_left, pad_right: [0, min((kernel_w-1), 15) * dilation_w] pad_left, pad_right: [0, (kernel_w - 1) * dilation_w] pad_left, pad_right: [0, (kernel_w - 1) * dilation_w] pad_left, pad_right: [0, min((kernel_w-1), 15) * dilation_w]
pad_top, pad_bottom: [0, min((kernel_h - 1), 15) * dilation_h] pad_top, pad_bottom: [0, (kernel_h - 1) * dilation_h] pad_top, pad_bottom: [0, min((kernel_h-1), 15) * dilation_h] pad_top, pad_bottom: [0, (kernel_h - 1) * dilation_h] pad_top, pad_bottom: [0, (kernel_h - 1) * dilation_h] pad_top, pad_bottom: [0, min((kernel_h-1), 15) * dilation_h]
In Size kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= bank_depth kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= bank_depth (6 * stride_w + kernel_w) * kernel_h + 4 <= 512
Out Size output_channel <= 256 * channel_parallel output_channel <= 256 * channel_parallel
Activation ReLU, ReLU6, LeakyReLU6, Hard-Swish, Hard-Sigmoid ReLU, ReLU6 ReLU, ReLU6, LeakyReLU7, Hard-Swish, Hard-Sigmoid ReLU, ReLU6 ReLU, ReLU6 ReLU, ReLU6, LeakyReLU, Hard-Swish, Hard-Sigmoid
Group* (Caffe) group==input_channel group==input_channel
transposed-conv2d Kernel size kernel_w/stride_w, kernel_h/stride_h: [1, 16]
Strides
Paddings pad_left, pad_right: [0, kernel_w-1]
pad_top, pad_bottom: [0, kernel_h-1]
Out Size output_channel <= 256 * channel_parallel
Activation ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid ReLU, ReLU6 ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid ReLU, LeakyReLU, ReLU6 ReLU, LeakyReLU ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid
depthwise-transposed-conv2d Kernel size kernel_w/stride_w, kernel_h/stride_h: [1, 256] kernel_w/stride_w, kernel_h/stride_h: [3] kernel_w/stride_w, kernel_h/stride_h: [1, 256] kernel_w/stride_w, kernel_h/stride_h: {1,2, 3, 5, 7} Not supported kernel_w/stride_w, kernel_h/stride_h: [1, 8] kernel_w/stride_w, kernel_h/stride_h: [1, 256]
Strides
Paddings pad_left, pad_right: [0, min((kernel_w-1), 15)] pad_left, pad_right: [1, kernel_w-1] pad_left, pad_right: [0, min((kernel_w-1),15)] pad_left, pad_right: [1, kernel_w-1] pad_left, pad_right: [1, kernel_w-1] pad_left, pad_right: [0, min((kernel_w-1),15)]
pad_top, pad_bottom: [0, min((kernel_h-1), 15)] pad_top, pad_bottom: [1, kernel_h-1] pad_top, pad_bottom: [0, min((kernel_h-1), 15)] pad_top, pad_bottom: [1, kernel_h-1] pad_top, pad_bottom: [1, kernel_h-1] pad_top, pad_bottom: [0, min((kernel_h-1), 15)]
Out Size output_channel <= 256 * channel_parallel output_channel <= 256 * channel_parallel
Activation ReLU, ReLU6, LeakyReLU6, Hard-Swish, Hard-Sigmoid ReLU, ReLU6 ReLU, ReLU6, LeakyReLU7, Hard-Swish, Hard-Sigmoid ReLU, ReLU6 ReLU, ReLU6 ReLU, ReLU6, LeakyReLU, Hard-Swish, Hard-Sigmoid
max-pooling Kernel size w, h: [1, 256]

ceil(h/bank_num) * w <= bank_depth

w, h: {2, 3, 5, 7, 8} w, h: [1, 256]

ceil(h/bank_num) * w <= bank_depth

w, h: [1, 8] w, h: [1, 16] w, h: [1, 128] w, h: [1, 256]

h * w <= bank_depth

Strides w, h: [1, 256] w, h: [1, 8] w, h: [1, 256] w, h: [1, 8] w, h: [1, 8] w, h: [1, 128] w, h: [1, 256]
Paddings pad_left, pad_right: [0, min((kernel_w-1), 15)] pad_left, pad_right: [1, kernel_w-1] pad_left, pad_right: [0, min((kernel_w-1), 15)] pad_left, pad_right: [1, kernel_w-1] pad_left, pad_right: [0, min((kernel_w-1), 15)]
pad_top, pad_bottom: [0, min((kernel_h-1), 15)] pad_top, pad_bottom: [1, kernel_h-1] pad_top, pad_bottom: [0, min((kernel_h-1), 15)] pad_top, pad_bottom: [1, kernel_h-1] pad_top, pad_bottom: [0, min((kernel_h-1), 15)]
Activation ReLU, ReLU6 not supported ReLU, ReLU6 not supported ReLU not supported ReLU, ReLU6
average-pooling Kernel size w, h: [1, 256]

ceil(h/bank_num) * w <= bank_depth

w, h: {2, 3, 5, 7, 8}

w==h

w, h: [1, 256]

ceil(h/bank_num) * w <= bank_depth

w, h: [1, 8]

w==h

w, h: [1, 16] w, h: [1, 128]

w==h

w, h: [1, 256]

h * w <= bank_depth

Strides w, h: [1, 256] w, h: [1, 8] w, h: [1, 256] w, h: [1, 8] w, h: [1, 8] w, h: [1, 128] w, h: [1, 256]
Paddings pad_left, pad_right: [0, min((kernel_w-1), 15)] pad_left, pad_right: [1, kernel_w-1] pad_left, pad_right: [0, min((kernel_w-1), 15)] pad_left, pad_right: [1, kernel_w-1] pad_left, pad_right: [0, min((kernel_w-1), 15)]
pad_top, pad_bottom: [0, min((kernel_h-1), 15)] pad_top, pad_bottom: [1, kernel_h-1] pad_top, pad_bottom: [0, min((kernel_h-1), 15)] pad_top, pad_bottom: [1, kernel_h-1] pad_top, pad_bottom: [0, min((kernel_h-1), 15)]
Activation ReLU, ReLU6 not supported ReLU, ReLU6 not supported ReLU not supported ReLU, ReLU6
eltwise type sum, prod sum sum, prod sum sum sum, prod 2-input sum, prod
Input Channel input_channel <= 256 * channel_parallel
Activation ReLU ReLU ReLU ReLU ReLU ReLU, Hard-Sigmoid ReLU
concat Network-specific limitation, which relates to the size of feature maps, quantization results and compiler optimizations.
reorg Strides reverse==false : stride ^ 2 * input_channel <= 256 * channel_parallel

reverse==true : input_channel <= 256 * channel_parallel

pad In Size input_channel <= 256 * channel_parallel
Mode "SYMMETRIC" ("CONSTANT" pad(value=0) would be fused into adjacent operators during compiler optimization process) "SYMMETRIC", "CONSTANT" (all padding value are identical) "SYMMETRIC" ("CONSTANT" pad(value=0) would be fused into adjacent operators during compiler optimization process)
global pooling Global pooling will be processed as general pooling with kernel size equal to input tensor size.
InnerProduct, Fully Connected, Matmul These ops will be transformed into conv2d op
resize scale NEAREST: ceil(scale/bank_num) * scale * ceil(input_channel/channel_parallel) <= bank_depth

BILINEAR: only for 4-D feature maps. This would be transformed into a pad and depthwise-transposed-conv2d.

TRILINEAR: only for 5-D feature maps. This would be transformed into a pad and transposed-conv3d.

mode NEAREST, BILINEAR NEAREST, BILINEAR NEAREST, BILINEAR, TRILINEAR NEAREST, BILINEAR NEAREST, BILINEAR NEAREST, BILINEAR NEAREST, BILINEAR
conv3d kernel size Not supported Not supported w, h, d: [1, 16]

w * h * ceil(ceil(input_channel/16) * 16 * d / 2048) <= 64

Not supported Not supported Not supported Not supported
strides w, h, d: [1, 8]
paddings pad_left, pad_right: [0, kernel_w-1]

pad_top, pad_bottom: [0, kernel_h-1]

pad_front, pad_back: [0, kernel_d-1]

In size kernel_w * kernel_h * kernel_d * ceil(input_channel/channel_parallel) <= bank_depth,

input_channel <= 256 * channel_parallel

Out size output_channel <= 256 * channel_parallel
Activation ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid
depthwise-conv3d kernel size Not supported Not supported w, h: [1, 256]

d: [1, 16]

Not supported Not supported Not supported Not supported
strides w, h: [1, 256]

d=1

paddings pad_left, pad_right: [0, min((kernel_w-1), 15)]

pad_top, pad_bottom: [0, min((kernel_h-1), 15)]

pad_front, pad_back: [0, min((kernel_d-1), 15)]

In size kernel_w * kernel_h * kernel_d * ceil(input_channel/channel_parallel) <= bank_depth
Out size output_channel <= 256 * channel_parallel
Activation ReLU, ReLU6
transposed-conv3d kernel size Not supported Not supported kernel_w/stride_w, kernel_h/stride_h, kernel_d/stride_d: [1, 16] Not supported Not supported Not supported Not supported
strides
paddings pad_left, pad_right: [0, kernel_w-1]

pad_top, pad_bottom: [0, kernel_h-1]

pad_front, pad_back: [0, kernel_d-1]

Out size output_channel <= 256 * channel_parallel
Activation ReLU, LeakyReLU, ReLU6, Hard-Swish, Hard-Sigmoid
depthwise-transposed-conv3d kernel size Not supported Not supported kernel_w/stride_w, kernel_h/stride_h, kernel_d/stride_d: [1, 16] Not supported Not supported Not supported Not supported
strides
paddings pad_left, pad_right: [0, min((kernel_w-1), 15)]

pad_top, pad_bottom: [0, min((kernel_h-1), 15)]

pad_front, pad_back: [0, min((kernel_d-1), 15)]

Out size output_channel <= 256 * channel_parallel
Activation ReLU, ReLU6
Strided_slice Stride

Stride_batch = 1

Stride_channel = 1
correlation1d_elemwise input size

input_channel <= 256 * channel_parrallel

Not supported

input_channel <= 256 * channel_parrallel

Not supported Not supported Not supported Not supported
correlation2d_elemwise input size

input_channel <= 256 * channel_parrallel

Not supported

input_channel <= 256 * channel_parrallel

Not supported Not supported Not supported Not supported
argmax axis axis = input_channel Not supported axis = input_channel Not supported Not supported Not supported axis = input_channel
input size input_channel < =128 input_channel < =128 input_channel < =128
reduction max axis axis = input_channel Not supported axis = input_channel Not supported Not supported Not supported axis = input_channel
input size input_channel < 2 ^ 12 input_channel < 2 ^ 12 input_channel < 2 ^ 12
cost_volume input size

input_channel <= 256 * channel_parallel

Not supported

input_channel <= 256 * channel_parallel

Not supported Not supported Not supported Not supported
transpose                
  1. For DPUCAHX8H, only list DPUCAHX8H_ISA2_DWC here. For more IP configurations, see DPUCAHX8H for Convolutional Neural Networks Product Guide (PG367)
  2. For DPUCVDX8H, only list  DPUCVDX8H_ISA1_F2W4_4PE here. For more IP configurations, see DPUCVDX8H for Convolutional Neural Networks LogiCORE IP (PG403)
  3. For DPUCZDX8G, only list DPUCZDX8G_ISA1_B4096 here. For more IP Configurations, see DPUCZDX8G for Zynq UltraScale+ MPSoCs (PG338)
  4. For DPUCVDX8G, only list DPUCVDX8G_ISA3_C32B3 here. For more IP Configurations, see DPUCVDX8G for Versal Adaptive SoCs Product Guide (PG389)
  5. For DPUCV2DX8G, only list DPUCV2DX8G_ISA1_C20B1 here. For more IP Configurations, see DPUCV2DX8G for Versal Adaptive SoCs Product Guide(PG425)
  6. For DPUCZDX8G, the activation LeakyReLU for depthwise-conv like operators is not enabled by default. About how to enable this activation, please refer to DPUCZDX8G for Zynq UltraScale+ MPSoCs (PG338)
  7. For DPUCVDX8G, the activation LeakyReLU for depthwise-conv like operators is not enabled by default. About how to enable this activation, please refer to DPUCVDX8G for Versal Adaptive SoCs Product Guide (PG389)