DPU-V1 (previously known as xDNN) IP cores are high performance general CNN processing engines (PE).
Figure 1. DPU-V1 Architecture
The key features of this engine are:
- 96x16 DSP Systolic Array operating at 700 MHz
- Instruction-based programming model for simplicity and flexibility to represent a variety of custom neural network graphs.
- 9 MB on-chip Tensor Memory composed of UltraRAM
- Distributed on-chip filter cache
- Utilizes external DDR memory for storing Filters and Tensor data
- Pipelined Scale, ReLU, and Pooling Blocks for maximum efficiency
- Standalone Pooling/Eltwise execution block for parallel processing with Convolution layers
- Hardware-Assisted Tiling Engine to sub-divide tensors to fit in on-chip Tensor Memory and pipelined instruction scheduling
- Standard AXI-MM and AXI4-Lite top-level interfaces for simplified system-level integration
- Optional pipelined RGB tensor Convolution engine for efficiency boost