Register replication can increase the speed of critical paths by making copies of registers to reduce the fanout of a given signal. This gives the implementation tools more flexibility in placing and routing the different loads and associated logic. Synthesis tools use this technique extensively.
Most synthesis tools use a fanout threshold limit to automatically determine whether to duplicate a register. Lowering this global threshold allows automatic duplication of high fanout nets. However, it does not allow control over which registers are duplicated or how their loads are grouped. In addition, the global replication mechanism does not assess timing slack accurately, which can lead to unnecessary replicated cells, logic utilization increase, and potentially higher power consumption.
For high frequency designs, a better approach to reducing fanout is to use a balanced tree for the high fanout signals. Consider manually replicating registers based on the design hierarchy, because the cells included in a hierarchy are often placed together. For example, in the balanced reset tree shown in the following figure, the high fanout reset FF RST2 is replicated in RTL to balance the fanout across the different modules. If required, physical synthesis can perform further replication to improve WNS based on placement information.
Do not replicate registers used for synchronizing signals that cross clock domains. The presence of the ASYNC_REG attribute on these registers prevents the tool from replicating these registers. If the synchronizing chain has a very high fanout and replication must meet timing, add an extra register after the synchronization chain that does not have the ASYNC_REG constraint.
The following table provides guidelines on the number of fanouts that might be acceptable for your design.
|Fanout > 5000
|Fanout > 200
|Fanout > 100
|Low Frequency 1 to 125 MHZ
|Few logic levels between synchronous logic <13 levels of logic at maximum frequency
|Medium Frequency 125 to 250 MHz
|If the design does not meet timing, you might need to reduce fanout and/or logic levels.
|<6 levels of logic at maximum frequency. (Driver and load types impact performance.)
|High Frequency > 250 MHz
|Not recommended for most designs.
|Small number of logic levels is typically necessary for higher speeds.
|Advance pipelining methods required. Careful logic replication. Compact functions. Low logic levels required. (Driver and load types impact performance.)
<original_name>_b, etc., to make it easier to understand intent of
the replication and easier to maintain the RTL code.