Multi-Clock Buffers

Versal ACAP Clocking Resources Architecture Manual (AM003)

Document ID
Release Date
1.4 English

For high-speed designs in previous architectures, it was common to use several clocks, generated by the same MMCM, with power of 2 ratios. In such cases, users generated clocks with separate MMCM outputs, used BUFGCE cells or the fastest clock with an MMCM, and created the divided clocks with BUFGCE_DIV cells. Those clock buffers were placed close to the clock managers and often caused timing challenges due to the amount of uncertainty introduced in the paths between source and destination clocks. In designs with synchronous paths between clocks, the routing delay matching implementation option (CLOCK_DELAY_GROUP) is needed to minimize the clock-domain-crossing skew, and limit the impact on setup and hold fixing. In larger devices, timing closure between clock domains can turn out to be very challenging and users, therefore, often make the choice to add logic, FIFOs, and treat clock paths as asynchronous paths.

With Versal® ACAP, you can use a multi-clock buffer (MBUFG) and generate up to four clocks close to the leaf clock pins instead of at the XPLL, MMCM, or DPLL, thereby greatly reducing the clock pessimism impact on synchronous clock-domain-crossing paths and solving design timing more efficiently. Basically there is a single route to the BUFDIV_LEAF cell which does the division at the leaf driver level. Each MBUFG type buffer uses the corresponding BUFG site, for example MBUFGCE will be placed on BUFGCE site and so on.

The different MBUFG clock buffers have the same attributes as their counterparts, but with one extra MODE attribute that can be set for PERFORMANCE or POWER. The type of clocks vary based on the mode PERFORMANCE or POWER as explained below.

When on a MBUFG-type clock buffer and the MODE attribute is set to PERFORMANCE, the outputs of that clock buffer behave as follows:

  • Output O1 = Input I
  • Output O2 = Input I divided by 2
  • Output O3 = Input I divided by 4
  • Output O4 =Input I divided by 8

    When on a MBUFG-type clock buffer and the MODE attribute is set to POWER, the outputs of that clock buffer behave as follows:

  • Output O1 – Input I multiplied by 2
  • Output O2 = Input I
  • Output O3 = Input I divided by 2
  • Output O4 = Input I divided by 4
Note: Only output O1 can be used to drive clock loads in the XPIO clock region. O2/O3/O4 cannot be used.

There are five types of MBUFG or multi-clock buffers: MBUFGCTRL, MBUFGCE, MBUFGCE_DIV, MBUFG_PS, and MBUFG_GT, as shown in the following figure. The multi-clock buffers have the same input pins as the respective equivalent BUFGCTRL, BUFGCE, BUFGCE_DIV, BUFG_PS, and BUFG_GT clock buffers (also described in this manual), but it has four different outputs.

The multi-clock buffers are logical-only clock buffers that use the equivalent BUFGs plus the physical-only BUFDIV_LEAF clock buffers to divide the clock closer to the loads. This removes the insertion delay from the clock buffer to the leaf clocks and reduces clock skew on synchronous clock domain crossing (CDC) paths. Many designs use multiple simple clock buffers divide-by-n clock buffers, where n can be 2, 4, or 8, and the MBUFG handles multiple instantiated buffers by offering a single buffer. The CLRB_LEAF input on the MBUFG primitives is used to asynchronously reset the BUFDIV_LEAF dividers. For more information on CLRB_LEAF reset requirements for MBUFG, see the Versal ACAP Hardware, IP, and Platform Development Methodology Guide (UG1387).

Figure 1. MBUFG Type Clock Buffers

This primitive is designed as a synchronous/asynchronous glitch-free multiplexer with two clock inputs and multiple outputs. If clock multiplexing is not necessary, use the MBUFGCE component.
This primitive is designed as a clock buffer with enable/disable possibilities with single clock input and multiple outputs.
This primitive is designed as a clock buffer by the gigabit transceiver devices for the purpose of clock distribution with minimal clock pessimism to other parts of the design. In addition to the MODE attribute setting, the behavior of the outputs also changes based on the value of pins DIV[2:0]. The divide value is (setting value + 1), for example, setting 3'b000 means a divide value of 1 and 3'b111 means a divide value of 8.
This primitive is designed as a multi-output high fanout buffer for low skew distribution of the PS clock signals.
This primitive is designed as a multi-output clock buffer with an enable, clear, and divide function. In addition to MODE attribute setting, setting the BUFGCE_DIV attribute to a value other than 1 will result in further division of the output clock by that factor.