During physical synthesis, the placer can perform various physical optimizations that
will optimize the netlist for later placement phases based on the initial placement of
the design after the floorplanning stage. For example, for fanout based replication the
replicated driver can be co-located with its loads because the initial placement is
known. This alleviates congestion that can be introduced when replication is done
without knowledge of placement prior to place_design
. Optimizations are
considered based on internal parameters and for timing based optimizations the timing is
evaluated and the optimization is committed if timing is improved. The following
optimizations are available as shown in as shown in the following figure.
- LUT Decomposition and Combining
- LUT Decomposition breaks LUT shapes if it improves timing (only LUTs with SOFT_HLUTNM property are considered). LUT combining combines LUTs if it improves utilization.
- Very High-Fanout Optimization
- Very High-Fanout Optimization replicates registers driving high-fanout nets (fanout > 1000, slack < 2.0 ns).
- Critical Cell Optimization
- Critical-Cell Optimization replicates cells in failing paths. If the loads on a specific cell are placed far apart, the cell might be replicated with new drivers placed closer to load clusters. This optimizations often applies to nets driving large block RAM or URAM arrays or large number of DSPs as the sites for these blocks are spread over a wider area of the device. High fanout is not a requirement for this optimization to occur (slack < 0.5 ns).
- Fanout Optimization
- Nets with a MAX_FANOUT property value that is less than the actual fanout of
the net are considered for fanout optimization. The user can force the
replication of a register or a LUT driving a net by adding the FORCE_MAX_FANOUT
property to the net. The value of the FORCE_MAX_FANOUT specifies the maximum
physical fanout the nets should have after the replication optimization. The
physical fanout in this case refers to the actual site pin loads, not the
logical loads. For example if the replica drives multiple LUTRAM loads that are
all grouped in the same slice, the combined fanout will be 1 for all of the
LUTRAMs in the same slice. The FORCE_MAX_FANOUT forces the replication during
physical synthesis regardless of the slack of the signal. The user can force
replication based on physical device attributes with the MAX_FANOUT_MODE
property. The property can take on the value of CLOCK_REGION, SLR, or MACRO. For
example, the MAX_FANOUT_MODE property with a value of CLOCK_REGION replicates
the driver based on the physical clock region, the loads placed into same clock
region will be clustered together. The MAX_FANOUT_MODE property takes precedence
over the FORCE_MAX_FANOUT property and physical synthesis will try to honor both
by applying MAX_FANOUT_MODE based optimization first and then all its replicated
drivers will inherit the FORCE_MAX_FANOUT property to do further replication
within a clock region. This is illustrated in the following figure example where
a register drives four loads; two registers and two MACRO loads (Block RAM,
UltraRAM or DSP). Replication provides separate drivers for the register loads
and MACRO loads and then the driver for the MACRO loads is replicated until the
FORCE_MAX_FANOUT property value is satisfied.Figure 2. Applying MAX_FANOUT_MODE with value MACRO together with FORCE_MAX_FANOUT
- DSP Register Optimization
- DSP Register Optimization can move registers out of the DSP cell into the logic array or from logic to DSP cells if it improves the delay on the critical path.
- Shift Register to Pipeline Optimization
- Shift Register to Pipeline Optimization turns a shift register with fixed length to dynamically adjusted register pipeline and places the pipeline optimally to improve timing. Only SRLs with the PHYS_SRL2PIPELINE attribute set to TRUE are considered for this optimization. The pull/push of FFs happens on the SRL's Q-pin. The SRL length needs to be fixed and dynamic SRLs are not supported for this optimization.
- Shift Register Optimization
- The shift register optimization improves timing on negative slack paths between shift register cells (SRLs) and other logic cells.
- Block RAM Register Optimization
- Block RAM Register Optimization can move registers out of the block RAM cell into the logic array or from logic to block RAM cells if it improves the delay on the critical path.
- URAM Register Optimization
- UltraRAM Register Optimization can move registers out of the UltraRAM cell into the logic array or from logic to UltraRAM cells if it improves the delay on the critical path.
- Dynamic/Static Region Interface Net Replication
- Optimization to replicate drivers on static design to reconfigurable module boundary paths in DFX flow.
- Equivalent Driver Rewire Optimization
- This optimization redistributes loads between logically-equivalent drivers to minimize routing overlap and provide a more optimal co-location of drivers and loads. This helps reduce utilization and congestion and allows later placer stages to move drivers and loads more optimally to improve QoR. For more information on these optimizations see Available Physical Optimizations in the Physical Optimization section. Physical synthesis in the placer is run by default in all of the placer directives. At the end of the physical synthesis phase, a table shows the summary of optimizations.