The Vivado implementation tools use a special algorithm to partition logic into multiple SLRs. For challenging designs, you can improve timing closure for designs that target SSI technology devices using the following guidelines.
To improve timing closure and compile times, you can use Pblocks to assign logic to each SLR and validate that individual SLRs do not have excessive utilization across all fabric resource types. For example, a design with block RAM utilization of 70% might cause issues with timing closure if the block RAM resources are not balanced across SLRs and one SLR is using over 85% block RAM.
resize_pblock pblock_SLR0 -add SLR0).
The following example utilization report for a vu160 shows that the overall block RAM utilization is 56% with 59% in SLR0, 40% in SLR1, and 58% in SLR2. The block RAM utilization is evenly distributed across SLRs with reasonable utilization in each SLR, which allows the Vivado implementation commands more flexibility to meet timing.
Xilinx recommends assigning block RAM and DSP groups to SLR Pblocks to minimize SLR crossings of shared signals. For example, an address bus that fans out to a group of block RAMs that are spread out over multiple SLRs can make timing closure more difficult to achieve, because the SLR crossing incurs additional delay for the timing critical signals.
Device resource location or user I/O selection anchors IP to SLRs, for example, GT, ILKN, PCIe, and CMAC dedicated block or memory interface controllers. Xilinx recommends the following:
- Pay special attention to dedicated block location and pinout selection to avoid data flow crossing SLR boundaries multiple times.
- Keep tightly interconnected modules and IP within the same SLR. If that is not possible, you can add pipeline registers to allow the placer more flexibility to find a good solution despite the SLR crossing between logic groups.
- Keep critical logic within the same SLR. By ensuring that main modules are properly pipelined at their interfaces, the placer is more likely to find SLR partitions with flip-flop to flip-flop SLR crossings.
In the following figure, a memory interface that is constrained to SLR0 needs to drive user logic in SLR1. An AXI4-Lite slave interface connects to the memory IP backend, and the well-defined boundary between the memory IP and the AXI4-Lite slave interface provides a good transition from SLR0 to SLR1.