Advanced Tab - 3.2 English

Zynq DPU Product Guide (PG338)

Document ID
PG338
Release Date
2020-07-07
Version
3.2 English

The following figure shows the Advanced tab of the DPU configuration.

Figure 1. DPU Configuration – Advanced Tab
S-AXI Clock Mode
s_axi_aclk is the S-AXI interface clock. When Common with M-AXI Clock is selected, s_axi_aclkshares the same clock as m_axi_aclk and the s_axi_aclk port is hidden. When Independent is selected, a clock different from m_axi_aclk must be provided.
dpu_2x Clock Gating
dpu_2x clock gating is an option for reducing the power consumption of the DPU. When the option is enabled, a port named dpu_2x_clk_ce appears for each DPU core. The dpu_2x_clk_ce port should be connected to the clk_dsp_ce port in the dpu_clk_wiz IP. The dpu_2x_clk_ce signal can shut down the dpu_2x_clk when the computing engine in the DPU is idle. To generate the clk_dsp_ce port in the dpu_clk_wiz IP, the clocking wizard IP should be configured with specific options. For more information, see the Reference Clock Generation section. Note that dpu_2x clock gating is not supported in Zynq®-7000 devices.
DSP Cascade
The maximum length of the DSP48E slice cascade chain can be set. Longer cascade lengths typically use fewer logic resources but might have worse timing. Shorter cascade lengths might not be suitable for small devices as they require more hardware resources. Xilinx recommends selecting the mid-value, which is four, in the first iteration and adjust the value if the timing is not met.
DSP Usage
This allows you to select whether DSP48E slices will be used for accumulation in the DPU convolution module. When low DSP usage is selected, the DPU IP will use DSP slices only for multiplication in the convolution. In high DSP usage mode, the DSP slice will be used for both multiplication and accumulation. Thus, the high DSP usage consumes more DSP slices and less LUTs. The logic utilization for high and low DSP usage is shown in the following table. The data is based on the DPU in the Xilinx ZCU102 platform without Depthwise Convolution, Average Pooling, Channel Augmentation, and Leaky ReLU features.
Note: DSP Cascade is not supported in Zynq-7000 devices and it is locked to 1.
Table 1. Resources for Different DSP Usage
High DSP Usage Low DSP Usage
Arch LUT Register BRAM DSP Arch LUT Register BRAM DSP
B512 20055 28849 69.5 98 B512 21171 33572 69.5 66
B800 21490 34561 87 142 B800 22900 33752 87 102
B1024 24349 46241 101.5 194 B1024 26341 49823 101.5 130
B1152 23527 46906 117.5 194 B1152 25250 49588 117.5 146
B1600 26728 56267 123 282 B1600 29270 60739 123 202
B2304 39562 67481 161.5 386 B2304 32684 72850 161.5 290
B3136 32190 79867 203.5 506 B3136 35797 86132 203.5 394
B4096 37266 92630 249.5 642 B4096 41412 99791 249.5 514
UltraRAM
There are two kinds of on-chip memory resources in Zynq® UltraScale+™ devices: block RAM and UltraRAM. The available amount of each memory type is device-dependent. Each block RAM block consists of two block RAM 18K slices which can be configured as 9b*4096, 18b*2048, or 36b*1024. UltraRAM has a fixed-configuration of 72b*4096. A memory unit in the DPU has a width of ICP*8 bits and a depth of 2048. For the B1024 architecture, the ICP is 8, and the width of a memory unit is 8*8 bit. Each memory unit can then be instantiated with one UltraRAM block. When the ICP is greater than 8, each memory unit in the DPU needs at least two UltraRAM blocks.

The DPU uses block RAM as the memory unit by default. For a target device with both block RAM and UltraRAM, configure the number of UltraRAM to determine how many UltraRAMs are used to replace some block RAMs. The number of UltraRAM should be set as a multiple of the number of UltraRAM required for a memory unit in the DPU. An example of block RAM and UltraRAM utilization is shown in the Summary tab section.

Timestamp
When enabled, the DPU records the time that the DPU project was synthesized. When disabled, the timestamp keeps the value at the moment of the last IP update. The timestamp information can be obtained using the Vitis™ AI tools.
Note: Most of the DPU configuration settings can be accessed by the Vitis AI tools. The following figure shows the information read by the Vitis AI tools.
Figure 2. Timestamp Example