Resource Utilization Guideline

Versal ACAP DSP Engine Architecture Manual (AM004)

Document ID
AM004
Release Date
2022-09-11
Revision
1.2.1 English

In the case of slow sample rate and small number of coefficients, the single MACC FIR filter is well suited. In the case of high sample rate and/or large number of coefficients, consider using a semi-parallel or parallel FIR filter. As for coefficients, if the number is large and/or the width is high, dual port block RAM is the preferred choice for the memory buffer. A high level implementation example of this design is provided in the following figure. If the number of coefficients and/or their width is small 1 , distributed memory (LUTRAM) can be used as coefficient buffer instead of block RAM. If the data width is small 1 , SRL16 can be used as data buffer instead of block RAM.

Note:
  1. Based on the size, synthesis tools in Vivado Design Suite automatically maps to block RAM or SRL16/LUTRAM. To choose between SRL16/LUTRAM and block RAM, users must compare the timing and resource utilization in both cases to find the optimal solution.
Figure 1. Single-Multiplier MACC FIR Filter

.

For block RAM implementation of the data buffer, the cyclic RAM buffer is used. For small-sized FIR filters (typically those under 32 taps), block RAM can be underutilized as a means to store filter input samples and coefficients. Block RAMs are not as abundant as the smaller distributed RAMs found in a nearby DSP58, making them an excellent option for smaller FIR filters. The following figure illustrates the actual single-multiplier MACC FIR filter implementation using distributed RAM for the coefficient bank and an SRL16 for the data buffer.

Figure 2. 16-Tap Distributed RAM MACC FIR Filter