The detailed hardware architecture of the DPUCAHX8H is shown in the following figure.
Each implementation has one to three DPU cores, and each DPU has three to five processing engines. The number of cores and PEs/core are chosen based on throughput needs versus FPGA resource usage. Following initialization, the DPU fetches instructions from system memory to control the operation of the computing engine. The Vitis™ AI toolchain is leveraged to parse, quantize and compile a trained model. The Vitis AI compiler is responsible to extract and compile the operators in the graph as a set of optimized micro-coded instructions which are executed by the DPU.
HBM is used to buffer weights, biases, intermediate feature maps, and output prediction metadata to achieve high throughput and efficiency.