The Xilinx® V3E Deep Learning Processor Unit (DPU) is a programmable engine optimized for convolutional neural networks, mainly for high throughput applications. This unit includes a high performance scheduler module, a hybrid computing array module, an instruction fetch unit module, and a global memory pool module. The DPU uses a specialized instruction set, which allows efficient implementation of many convolutional neural networks. Some examples of convolutional neural networks that are deployed include VGG, ResNet, GoogLeNet, YOLO, SSD, MobileNet, FPN, and many others.
The DPU IP can be implemented in the programmable logic (PL) of the selected Virtex® UltraScale+™ VU37P or Virtex UltraScale+ VU35P devices with HBM device. The DPU requires instructions to implement a neural network and accessible memory locations for input images as well as temporary and output data. A user-defined unit running on PL (Program Logic) is also required to do necessary configuration, inject instructions, service interrupts and coordinate data transfers.
The top-level block diagram of DPU is shown in the following figure.