Improving Skew in Versal Devices - 2021.2 English

Versal ACAP System Integration and Validation Methodology Guide (UG1388)

Document ID
Release Date
2021.2 English

Following are general recommendations for reducing clock skew when working with the Versal architecture. For more information, see this link in the Versal ACAP Hardware, IP, and Platform Development Methodology Guide (UG1387).

  • Avoid using an MMCM, XPLL, or DPLL to perform simple division of a BUFG_GT clock. BUFG_GT cells can divide down the input clock. When providing more than one simple division of the BUFG_GT clock to the fabric, the MBUFG_GT cells can divide down the clock using leaf-level division to minimize resource usage and improve QoR. The following figure shows how to save an MMCM resource and implement balanced clock trees for two clocks originating from a GT*_QUAD cell using the MBUFG_GT.
    Figure 1. Implementing Balanced Clock Trees Using Versal MBUFG_GTs

  • If frequency synthesis of a GT clock is necessary, a DPLL exists in the clock regions that have GT*_QUAD resources.
  • For timing paths between synchronous clocks, use the MBUFGCE, MBUFGCE_DIV, MBUFGCTRL, MBUFG_PS, and MBUFG_GT primitives to take advantage of leaf-level division, minimize resource usage, and improve timing QoR.
  • When using parallel clock buffers, use the CLOCK_DELAY_GROUP on the driver net of critical synchronous clocks to force CLOCK_ROOT and route matching during placement and routing. The buffers of the clocks must be driven by the same cell for this constraint to be honored.
    Note: This optimization technique is automatically applied by the report_qor_suggestions Tcl command.
  • If a timing path is having difficulty meeting timing and the skew is larger than expected, it is possible that the timing path is crossing a resource column or clock region. If this is the case, physical constraints such as Pblocks can be used to force the source and destination into a single clock region or to prevent the crossing of a resource column, such as a network on chip (NoC), 100G multirate Ethernet MAC (MRMAC), or integrated block for PCIe (Gen4 x16).
  • Verify that clock nets with CLOCK_DEDICATED_ROUTE=FALSE constraint are routed with global clocking resources. Use ANY_CMT_REGION instead of FALSE to ensure the clock nets with routing waivers are routed with dedicated clocking resources only. If the clock net is routed with fabric interconnect, identify the design change or clocking placement constraint needed to resolve this situation and make the implementation tools use global clocking resources instead. Clock paths routed with fabric interconnect can have high clock skew or be impacted by switching noise, leading to poor performance or non-functional designs.