Clock Structure

UltraScale Architecture Clocking Resources User Guide (UG572)

Document ID
UG572
Release Date
2023-02-01
Revision
1.10.2 English

The basic device architecture is composed of blocks of CRs. CRs are organized into tiles and thus build columns and rows. Each CR contains slices (CLBs), DSPs, and 36K block RAM blocks. The mix of slice, DSP, and block RAM columns in each CR can be different, but are always identical when stacked in the vertical direction, thus building columns of those resources for the entire device. I/O and GT columns are then inserted with columns of CRs. In addition, there is a single column that contains the configuration logic, SYSMON, and PCIe blocks. An HCS runs horizontally through the device in the center of each row of CRs, I/Os, and GTs. The HCS contains the horizontal routing and distribution tracks as well as leaf clock buffers and clock network interconnects between horizontal/vertical routing and distribution. Vertical tracks of routing and distribution connect all CRs in a column, while vertical routing spans an entire I/O column. There are 24 horizontal routing and 24 distribution tracks ( This Figure ), and 24 vertical routing and 24 distribution tracks ( This Figure ). The purpose of the clock routing resources is to route a clock from the global clock buffers to a central point from where it is connected to the loads via the distribution resources. This central point of the clock network is called a clock root in the UltraScale architecture. The root can be in any CR in a device from where it is routed to the loads via the clock distribution resources. This architecture optimized clock skew. Routing and distribution resources can either connect to adjacent CRs or disconnect (isolated) at the border of the CR as needed. This concept extends to SSI devices as well.

Figure 2-1: Horizontal Clocking

X-Ref Target - Figure 2-1

X16662-horizontal-clocking.jpg

Figure 2-2: Vertical Clocking

X-Ref Target - Figure 2-2

X16663-vertical-clocking.jpg

The clocks can be distributed from their sources in one of two ways ( This Figure ):

The clocks can go onto routing tracks that take the clocks to a central point in a CR without going to any loads. The clocks can then drive the distribution tracks unidirectionally from which the clock networks fan out. In this way, the clock buffers can drive to a specific point in the CRs from which the clock buffers travel vertically and then horizontally on the distribution tracks to drive the clocking points. The clocking points are driven via leaf clocks with clock enable (CE) in that CR and adjacent CRs, if needed. Distribution tracks cannot drive routing tracks.

This distribution scheme is used to move the root for all the loads to be at a specific location for improved, localized skew. Furthermore, both routing and distribution tracks can drive into horizontally or vertically adjacent CRs in a segmented fashion. Routing tracks can drive both routing and distribution tracks in the adjacent CRs while the distribution tracks can drive other horizontal distribution tracks in adjacent CRs. The CR boundary segmentation allows construction of either truly global, device-wide clock networks or more local clock networks of variable sizes by reusing clocking tracks.

Alternatively, clock buffers can drive straight onto the distribution tracks and distribute the clock in that manner. This reduces the clock insertion delay.

Figure 2-3: Clock Region Clocking

X-Ref Target - Figure 2-3

X16681-clock-region-block.jpg

Each of the four bytes in the XIPHY BITSLICE have six connections from the HCS to their global clocking pins. Therefore, only six BUFGs can drive the BITSLICE clocking pins in either half of an I/O bank (a maximum of 6 clocks can drive any half of an I/O bank).