Synchronous CDC - 2023.2 English

Versal Adaptive SoC Hardware, IP, and Platform Development Methodology Guide (UG1387)

Document ID
UG1387
Release Date
2023-11-15
Version
2023.2 English

When the design includes synchronous CDC paths between clocks that originate from the same MMCM/XPLL/DPLL, you can use the following techniques to better control the clock insertion delays and skew and therefore, the slack, on those paths.

Important: If the CDC paths are between clocks that originate from different MMCM/XPLL/DPLLs, the clock insertion delays across the MMCM/XPLL/DPLLs are more difficult to control. In this case, AMD recommends that you treat these clock domain crossings as asynchronous and make design changes accordingly.
Important: If the CDC paths are between the input clock and the output clock, or between two or more output clocks of a MMCM or XPLL primitive, see this link in the Versal Adaptive SoC Clocking Resources Architecture Manual (AM003) to ensure that the configuration of the primitive allows safe timing between the clocks. For DPLL, see this link in Versal Adaptive SoC Clocking Resources Architecture Manual (AM003).

When a path is timed between two clocks that originate from different output pins of the same MMCM/XPLL/DPLL, the MMCM/XPLL/DPLL phase error adds to the clock uncertainty for the path. For designs using high clock frequencies, the phase error can cause issues with timing closure both for setup and hold.

The following figure shows an example of paths both with and without the phase error. Path 1 is a CDC path clocked by two buffers connected to the same MMCM output and does not include the phase error. Path 2 is clocked by two clocks that originate from two different MMCM outputs and does include the phase error.

Figure 1. MMCM and Phase Error

When two synchronous clocks from the same MMCM/XPLL/DPLL have a simple period ratio (/2 /4 /8), you can prevent the phase error between the two clock domains using a single MMCM/XPLL/DPLL output connected to a single MBUFGCE or connected to two BUFGCE_DIV buffers. The MBUFGCE cell can perform simple clock division of (/1 /2 /4 /8) and simple clock multiplication (*2). The BUFGCE_DIV buffer can perform the simple clock division (/1 /2 /4 /8). The BUFGCE_DIV can also provide other division ratios (/3 /5 /6 /7) but this requires modifying the clock duty cycle and making mixed edge timing paths more challenging.

The following figure shows a single MBUFGCE cell that divide the CLKOUT0 clock by 1 on the O1 pin and by 2 on the O2 pin. The MBUFGCE cell does not require any additional clock constraints on the logical output nets, because the net is routed on a single clock track until it reaches the leaf-level dividers.

Figure 2. Synchronous CDC with MBUFGCE Connected to One MMCM Output

The following figure shows two BUFGCE_DIVs that divide the CLKOUT0 clock by 1 and by 2, respectively.

Figure 3. Synchronous CDC with BUFGCE_DIVs Connected to an MMCM Output
Note: Because the BUFGCE and BUFGCE_DIV do not have the same cell delays, AMD recommends using the same clock buffer for both synchronous clocks (e.g., two BUFGCE or two BUFGCE_DIV buffers).
Important: To ensure safe timing between parallel BUFGCE_DIV cells where the BUFGCE_DIVIDE property is set to a value greater than 1, both buffers must use the same enable signal (CE) and the same reset signal (RST). Otherwise, the divided clocks might become phase shifted from one another in hardware, which is not reported by the Vivado tools.

To automatically balance several clocks that originate from the same MMCM or PLL, set the same CLOCK_DELAY_GROUP property value on the nets driven by the clock buffers that need to be balanced. Following are additional recommendations:

  • Avoid setting the CLOCK_DELAY_GROUP constraint on too many clocks, because this stresses the clock placer resulting in suboptimal solutions or errors.
  • Use the GCLK_DESKEW constraint with value OFF in combination with the CLOCK_DELAY_GROUP constraint to minimize and match insertion delay on clock nets.
  • Review the critical synchronous CDC paths in the Timing Summary Report to determine which clocks must be delay matched to meet timing.
  • Limit the use of the CLOCK_DELAY_GROUP on groups of synchronous clocks with tight requirements and with identical clocking topologies.
    Important: AMD recommends using the Clocking Wizard for creating optimal clocking structures, which use a mix of BUFGCEs and BUFGCE_DIVs along with related clock grouping constraints.