Optimizing RAMB Input Logic to Allow Output Register Inference - 2021.1 English

Vivado Design Suite User Guide: Design Analysis and Closure Techniques (UG906)

Document ID
UG906
Release Date
2021-06-30
Version
2021.1 English

The following RTL code snippet generates a critical path from block RAM (actually it is a ROM) with multiple logic levels ending at a flip-flop (FF). The RAMB cell has been inferred without the optional output registers (DOA-0), which adds over 1 ns extra delay penalty to the RAMB output path.

Figure 1. Memory RTL Code Without Inferred RAMB Output Register

The critical path for the above RTL code is shown by the tool, such as in the following figure.

Figure 2. Critical Path from RAMB Without Output Register Enabled

It is good practice to review the critical paths after synthesis and after each implementation step in order to identify which groups of logic need to be improved. For long paths or any paths that do not take advantage of the FPGA hardware features optimally, go back to the RTL description, try to understand why the synthesized logic is not optimal, and modify the code to help the synthesis tool improve the netlist.

Vivado has a powerful embedded debugging mechanism that you can use to start off with elaborated view. The elaborated view helps to identify where the problem could be, instead of manually searching through the RTL code. See the elaborated view shown in the following figure for the above RTL code snippet.

Figure 3. Elaborated View of RTL Code Snippet

The elaborated view gives a good hint about the inefficient structure for the given test case. In this case, the problem comes from the address register fanout (addr_reg3_reg), which drives the memory address as well as some glue-logic, highlighted in blue.

RAMB inference by the synthesis tool requires a dedicated address register in the RTL code, which is not compatible with the current address register fanout. As a consequence, the synthesis tool re-times the output register in order to allow the RAMB inference instead of using it to enable the RAMB optional output register.

By replicating the address register in the RTL code so that the memory address and the interconnect logic | FPGA logic are driven by separate registers, the RAMB will be inferred with the output registers enabled.

The RTL code and elaborated view after manual replication are shown in the following figures:

Figure 4. RTL Code with the Replicated Address Register

Figure 5. Elaborated View of the Replicated Address Register

The critical path for the modified RTL code can be seen in the following figure. Notice the following:

  • The addr_reg2_reg register is connected to the address pin of the block RAM.
  • The addr_reg3_reg register has been absorbed in the Block RAM.
  • The RAMB output register is enabled, which significantly reduces the datapath delay on the RAMB outputs.
    Figure 6. Critical Path for the Modified RTL Code