Hardware Architecture

The detailed hardware architecture of the DPUCVDX8H is shown in the following figure. Each implement could have one DPU instance, and each DPU may have two, four, six, or eight processing engines instances, the number of DPU instances depends on FPGA resource.

The Conv computing unit is implemented on AI Engine. The Conv control unit, Load unit, and save unit are implemented in programmable logic. MISC unit (pooling and element-wise processing) is implemented on AI Engine or in programmable logic. All processing engines share the weight unit and scheduler unit, implemented with programmable logic. DRAM is used as system memory to store network instructions, input images, output results, and intermediate data. After bring-up, DPU fetches instructions from system memory to control the operations of the computing engine.

On-chip memory is used to buffer weights, bias, and intermediate data to achieve high throughput. Feature map banks are private to each processing engine. All processing engines share weights buffer in the same DPU instance. The data is reused as much as possible to reduce the memory bandwidth. The Conv processing engines (PE) take full advantage of the computing power of the AI Engine to get high performance.

Figure 1. DPU Hardware Architecture (Misc unit on AI Engine)

Figure 2. DPU Hardware Architecture (Misc unit in PL)

Hardware Architecture - 1.1 English

DPUCVDX8H for Convolutional Neural Networks Product Guide (PG403)