Feature Support - 1.3 English

DPUCVDX8G for Versal ACAPs Product Guide (PG389)

Document ID
PG389
Release Date
2023-01-23
Version
1.3 English

The DPUCVDX8G provides user-configurable parameters to optimize resource usage and customize features. Different configurations can be selected for AI Engines, DSP slices, LUT, block RAM, and UltraRAM usage based on the amount of available programmable logic resources. There are also options for additional functions, such as channel augmentation, average pooling, and depthwise convolution. Furthermore, there is an option to configure the number of batch handlers that are instantiated in a single DPUCVDX8G IP instance. The deep neural network features and the associated parameters supported by the DPUCVDX8G are shown in the following table.

A configuration file named arch.json is generated during implementation when the DPUCVDX8G is integrated in the Vitis™ accelerated flow. The arch.json file is used by the Vitis AI Compiler for model compilation. Each time a DPU configuration change results in a change to arch.json, you must recompile the .xmodel file for all networks that are to be deployed on that specific DPU instance. For more information on the Vitis AI Compiler, refer to the Vitis AI User Guide (UG1414). In the Vitis accelerated flow, the arch.json file is located at $TRD_HOME/vitis_prj/package_out/sd_card/arch.json.

Table 1. DPUCVDX8G Operation and Parameter Support
Features Description Range
Convolution 2D and 3D Kernel Sizes w, h, d: [1, 16]

w * h * ceil ( ceil ( input_channel / 16 ) * 16 * d / 2048 ) <= 64

Strides w, h, d: [1, 8]
Padding w: [0, kernel_w-1]

h: [0, kernel_h-1]

d: [0, kernel_d-1]

Input Size Arbitrary
Input Channel kernel_w * kernel_h * kernel_d * ceil(input_channel / channel_parallel) <= bank_depth
Output Channel 1~256 * channel_parallel
Activation ReLU, ReLU6, LeakyReLU, PReLU, Hard Sigmoid and Hard Swish
Dilation dilation * input_channel ≤ 256 * channel_parallel && stride_w == 1 && stride_h == 1
Depthwise Convolution 2D and 3D Kernel Sizes w, h: [1, 256]

d: [1, 16]

Strides w, h: [1, 256]

d = 1

Padding w: [0, min(kernel_w-1,15)]

h: [0, min(kernel_h-1,15)]

d: [0, kernel_d-1]

Input Size Arbitrary
Input Channel kernel_w * kernel_h * kernel_d * ceil(input_channel / channel_parallel) <= bank_depth
Output Channel 1~256 * channel_parallel
Activation ReLU, ReLU6, LeakyReLU, Hard Sigmoid, and Hard Swish
Dilation dilation * input_channel ≤ 256 * channel_parallel && stride_w == 1 && stride_h == 1
Transposed Convolution 2D and 3D Kernel Sizes kernel_w/stride_w: [1, 16]

kernel_h/stride_h: [1, 16]

kernel_d/stride_d: [1, 16]
Strides
Padding w: [0, kernel_w-1]

h: [0, kernel_h-1]

d: [0, kernel_d-1]

Input Size Arbitrary
Input Channel kernel_w * kernel_h * kernel_d * ceil(input_channel / channel_parallel) <= bank_depth
Output Channel 1~256 * channel_parallel
Activation ReLU, ReLU6, LeakyReLU, Hard Sigmoid, and Hard Swish
Depthwise Transposed Convolution 2D and 3D Kernel Sizes kernel_w/stride_w: [1, 256]

kernel_h/stride_h: [1, 256]

kernel_d/stride_d: [1, 256]
Strides
Padding w: [0, min(kernel_w-1,15)]

h: [0, min(kernel_h-1,15)]

d: [0, kernel_d-1]

Input Size Arbitrary
Input Channel kernel_w * kernel_h * kernel_d * ceil(input_channel / channel_parallel) <= bank_depth
Output Channel 1~256 * channel_parallel
Activation ReLU, ReLU6, LeakyReLU, Hard Sigmoid, and Hard Swish
Max Pooling Kernel Sizes w, h: [1, 256]
Strides w, h: [1, 256]
Padding w: [0, min(kernel_w-1,15)]

h: [0, min(kernel_h-1,15)]

Average Pooling Kernel Sizes w, h: [1, 256]
Strides w, h: [1, 256]
Padding w: [0, min(kernel_w-1,15)]

h: [0, min(kernel_h-1,15)]

Elementwise-Sum 2D and 3D Input channel 1~256 * channel_parallel
Input size Arbitrary
Feature Map Number 1~4
Elementwise-Multiply 2D and 3D Input channel 1~256 * channel_parallel
Input size Arbitrary
Feature Map Number 2
Concat Output channel 1~256 * channel_parallel
Reorg Strides stride * stride * input_channel ≤ 256 * channel_parallel
Fully Connected (FC) Input_channel Input_channel ≤ 2048 * channel_parallel
Output_channel Arbitrary
  1. In DPUCVDX8G, the channel_parallel parameter is 16.
  2. In some neural networks, the FC layer is connected with a Flatten layer. The Vitis AI compiler automatically combines the Flatten+FC to a global CONV2D layer, and the CONV2D kernel size is directly equal to the input feature map size of Flatten layer. For this case, the input feature map size cannot exceed the limitation of the kernel size of CONV, otherwise an error is generated during compilation. If there is no flatten layer, the FC layer will be treated as a normal conv layer. This limitation occurs only in the Flatten+FC situation.
  3. The bank_depth parameter indicates on-chip weight buffer depth. In the DPUCVDX8G, the default bank_depth is 8192.
  4. If the Batch Normalization is quantized and can be transformed to a depthwise-conv2d equivalently, it will be transformed to depthwise-conv2d and the compiler will search for compilation opportunities to map the Batch Normalization into DPU-friendly operations. Otherwise, batch_norm operators will be executed by CPU.