Performance/Power Trade-Off for Block RAMs - 2022.1 English

UltraFast Design Methodology Guide for Xilinx FPGAs and SoCs (UG949)

Document ID
UG949
Release Date
2022-06-08
Version
2022.1 English

There are multiple ways of breaking a memory configuration to serve a particular requirement. The requirement for a particular design can be performance, power, or a mixture of both.

The following example highlights the different structures that can be generated to achieve your requirements. Synthesis can limit the cascading of the block RAM for the performance/power trade-off using the CASCADE_HEIGHT attribute. The usage and arguments for the attribute are described in the Vivado Design Suite User Guide: Synthesis (UG901).

The following figure shows an example of 8Kx32 memory configuration for higher performance (timing).

Note: This example applies to UltraScale and UltraScale+ devices only.
Figure 1. RTL Representation of 4Kx32 Using 4Kx8 and CASCADE_HEIGHT=1

In this implementation, all block RAMs are always enabled (for each read or write) and consume more power.

The following figure shows an example of cascading all the block RAMs for low power.

Figure 2. RTL Representation of 4Kx32 Using 1Kx32 and CASCADE_HEIGHT=4

In this implementation, because one block RAM at a time is selected (from each unit), the dynamic power contribution is almost half. Block RAMs have a dedicated cascade MUX and routing structure that allows the construction of wide, deep memories requiring more than one block RAM primitive to be built in a very power efficient configuration.

The following figure shows an example of how to limit the cascading and gain both power and performance at the same time, often with no trade-off in performance.

Note: This example applies to UltraScale and UltraScale+ devices only.
Figure 3. RTL Representation of 4Kx32 Using 2Kx16 and CASCADE_HEIGHT=2

Because two block RAMs are selected at a time in this implementation, the dynamic power contribution is better than for the high performance structure, but not as good as for the low power structure. The advantage with this structure compared to a low power structure is that it uses only two block RAMs in the cascaded path, which has impact on the target frequency when compared to four block RAMs in the critical path for the low power structure.