Replicate High Fanout Net Drivers - 2020.2 English

UltraFast Design Methodology Guide for Xilinx FPGAs and SoCs

Document ID
Release Date
2020.2 English

Register replication can increase the speed of critical paths by making copies of registers to reduce the fanout of a given signal. This gives the implementation tools more flexibility in placing and routing the different loads and associated logic. Synthesis tools use this technique extensively.

Most synthesis tools use a fanout threshold limit to automatically determine whether to duplicate a register. Lowering this global threshold allows automatic duplication of high fanout nets. However, it does not allow control over which registers are duplicated or how their loads are grouped. In addition, the global replication mechanism does not assess timing slack accurately, which can lead to unnecessary replicated cells, logic utilization increase, and potentially higher power consumption.

Often, a better approach to reducing fanout is to use a balanced tree for the high fanout signals. Consider manually replicating registers based on the design hierarchy, because the cells included in a hierarchy are often placed together. For example, in the balanced reset tree shown in the following figure, the high fanout reset FF RST2 is replicated in RTL to balance the fanout across the different modules. If required, physical synthesis can perform further replication to improve WNS based on placement information.

Tip: To preserve the duplicate registers in synthesis, use a KEEP attribute instead of DONT_TOUCH. A DONT_TOUCH attribute prevents further optimization during physical optimization later in the implementation flow.
Note: If a LUT1 rather than a register is replicated, it indicates that an attribute or constraint is applied incorrectly.
Figure 1. High Fanout Reset Transformed to Balanced Reset Tree
Recommended: Using MAX_FANOUT attributes on global high fanout signals leads to suboptimal replication similar to when the global fanout limit is lowered in synthesis. For this reason, Xilinx recommends only using MAX_FANOUT inside the hierarchies on local signals with medium to low fanout.

Do not replicate registers used for synchronizing signals that cross clock domains. The presence of the ASYNC_REG attribute on these registers prevents the tool from replicating these registers. If the synchronizing chain has a very high fanout and replication must meet timing, add an extra register after the synchronization chain that does not have the ASYNC_REG constraint.

The following table provides guidelines on the number of fanouts that might be acceptable for your design.

Table 1. Fanout Guidelines for Medium Performance 7 Series Devices
Condition Fanout > 5000 Fanout > 200 Fanout > 100
Low Frequency 1 to 125 MHZ Few logic levels between synchronous logic <13 levels of logic at maximum frequency N/A N/A
Medium Frequency 125 to 250 MHz If the design does not meet timing, you might need to reduce fanout and/or logic levels. <6 levels of logic at maximum frequency. (Driver and load types impact performance.) N/A
High Frequency > 250 MHz Not recommended for most designs. Small number of logic levels is typically necessary for higher speeds. Advance pipelining methods required. Careful logic replication. Compact functions. Low logic levels required. (Driver and load types impact performance.)
Tip: If the timing reports indicate that high-fanout signals are limiting the design performance, consider replicating the signals using the implementation tool options, such as opt_design -hier_fanout_limit, place_design, and phys_opt_design.
Tip: When replicating registers, consider using a naming convention for the registers, such as <original_name>_a, <original_name>_b, etc., to make it easier to understand intent of the replication and easier to maintain the RTL code.