The AMD DPUCAHX8L is a programmable DPU core optimized for convolutional neural networks, mainly for low latency applications. The engine includes a high-performance scheduler module, a hybrid computing array module, and an instruction fetch unit module. It uses a specialized instruction set, which allows for the efficient implementation of many convolutional neural networks. Some examples of convolutional neural networks that are deployed include VGG, ResNet, GoogLeNet, YOLO, SSD, FPN, and many others.
The DPUCAHX8L is implemented in the programmable logic (PL) of the Alveo U280/U55C and U50/U50LV Data Center accelerator cards.
The following figure shows the top-level block diagram of the DPUCAHX8L: