DPU Configuration - 1.2 English

DPUCAHX8H for Convolutional Neural Networks Product Guide (PG367)

Document ID
PG367
Release Date
2024-03-20
Version
1.2 English

There is an option to configure the number of DPU PEs that are instantiated in a single DPU IP. The deep neural network features and the associated parameters supported by the DPU are shown in the following table.

Table 1. Deep Neural Network Features and Parameters Supported by DPU
Features Description (channel_parallel=16)
conv2d Kernel Sizes kernel_w: [1, 16]

kernel_h: [1, 16]

Strides stride_w: [1, 4]

stride_h: [1, 4]

Pad_left/Pad_right [0, (kernel_w - 1) * dilation_w + 1]
Pad_top/Pad_bottom [0, (kernel_h - 1) * dilation_h + 1]
In Size

kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= 2048

Out Size output_channel <= 256 * channel_parallel
Activation

ReLU, LeakyReLU, ReLU6

Dilation dilation * input_channel <= 256 * channel_parallel
depthwise-conv2d Kernel Sizes

kernel_w: {1, 2, 3, 5, 7}

kernel_h: {1, 2, 3, 5, 7}

Strides stride_w: [1, 4]

stride_h: [1, 4]

Pad_left/Pad_right [1, (kernel_w - 1)]
Pad_top/Pad_bottom [1, (kernel_h - 1)]
In Size kernel_w * kernel_h * ceil(input_channel / channel_parallel) <= 2048
Out Size output_channel <= 256 * channel_parallel
Activation ReLU, ReLU6
transposed-conv2d Kernel Sizes kernel_w: [1, 16]

kernel_h: [1, 16]

Strides stride_w: [1, 16]

stride_h: [1, 16]

Pad_left/Pad_right [1, kernel_w-1]
Pad_top/Pad_bottom [1, kernel_h-1]
Out Size output_channel <= 256 * channel_parallel
Activation

ReLU, LeakyReLU, ReLU6

depthwise-transposed-conv2d Kernel Sizes

kernel_w/stride_w, kernel_h/stride_h : {1, 2, 3, 5, 7}

Strides
Pad_left/Pad_right [1, kernel_w-1]
Pad_top/Pad_bottom [1, kernel_h-1]
Out Size output_channel <= 256 * channel_parallel
Activation ReLU, ReLU6
average-pooling Kernel Sizes

kernel_w: [1, 8] kernel_h: [1, 8] kernel_w==kernel_h

Strides stride_w: [1, 8] stride_h: [1, 8]
Pad_left/Pad_right [1, kernel_w-1]
Pad_top/Pad_bottom [1, kernel_h-1]
elementwise-sum Input channel input_channel <= 256 *channel_parallel[1, 8912]
Activation ReLU
Concat Network-specific limitation related to the size of feature maps, quantization results, and compiler optimizations.
Fully Connected Input Channel

input channel <= 16*16*16