DepthwiseConv (ALU) - 4.0 English

DPUCZDX8G for Zynq UltraScale+ MPSoCs Product Guide (PG338)

Document ID
Release Date
4.0 English

In conventional convolution, each input channel needs to perform the operation with one specific kernel, and then the result is obtained by combining the results of all channels together.

In depthwise separable convolution, the operation is performed in two steps: depthwise convolution and pointwise convolution. Depthwise convolution is performed for each feature map separately as shown on the left side of the following figure. The next step is to perform pointwise convolution, which is the same as conventional convolution with kernel size 1x1. The parallelism of depthwise convolution is half that of the pixel parallelism.

In DPUCZDX8G, the depthwise conv is performed by the ALU engine, along with the pooling. The ALU parallel ranges from 1 to PP, and is recommended to be set as PP/2.

Figure 1. Depthwise Convolution and Pointwise Convolution
Table 1. Resources of DPUCZDX8G B4096 with Different ALU Parallel
ALU Parallel LUTs FF Block RAMs DSPs
1 44212 88250 255 662
2 46599 92380 255 678
51388 98525 255 710
8 60751 111329 255 774