Hardware Architecture - 1.2 English

DPUCAHX8H for Convolutional Neural Networks Product Guide (PG367)

Document ID
PG367
Release Date
2024-03-20
Version
1.2 English

The detailed hardware architecture of the DPUCAHX8H is shown in the following figure.

Each implementation has one to three DPU cores, and each DPU has three to five processing engines. The number of cores and PEs/core are chosen based on throughput needs versus FPGA resource usage. Following initialization, the DPU fetches instructions from system memory to control the operation of the computing engine. The AMD Vitis™ AI toolchain is leveraged to parse, quantize, and compile a trained model. The Vitis AI compiler is responsible to extract and compile the operators in the graph as a set of optimized micro-coded instructions which are executed by the DPU.

HBM is used to buffer weights, biases, intermediate feature maps, and output prediction metadata to achieve high throughput and efficiency.

Figure 1. DPU Hardware Architecture