Asynchronous Mode Support - 1.0 English

Advanced IO Wizard LogiCORE IP Product Guide (PG320)

Document ID
PG320
Release Date
2022-10-19
Version
1.0 English

In an asynchronous (Beta) mode, there is no incoming clock/strobe associated with data. The CDR (clock data recovery) module is provided in the wizard generated wrapper to enable the data capture. Data_Out is the actual data from the async mode design, Data_Valid is the signal that would indicate that Data_Out is valid when Dataout_Valid is high. The Advanced IO wizard IP supports the following two types of CDR modes:

  • CDR with PPM difference
  • CDR with Zero PPM

If the application is set to asynchronous mode, CDR with PPM difference module is used by default. To use the Zero PPM CDR, enable the option Enable ZERO PPM CDR. This option is grayed out by default, but it is available when the application is set to asynchronous.

Figure 1. Asynchronous Mode Structure

CDR with PPM Difference

Note: In this mode, the I/O pins are required to be differential.

The purpose of CDR is to ensure that the UI sampling is done always at the center for asynchronous signals. Data is received into differential pair in bitslices. The sampling of UI is done at the same frequency as the data rate. For example, SGMII data rate is 1250MBps then RX and TX PLL clock frequency should be 1250MHz. Thus, you are sampling each UI twice; one in the center of the UI and other one at the edge of UI.

The sample coming from the center of the UI is valid data and other sample is used to keep the clock in the center of the data by updating the delay line.

When the clock is in the center of the UI of one of the bitslices, start capturing data and forward to next stage. When there is PPM difference, delay line will move towards either 0 taps or 100 taps. In either scenarios, you need to move the reference data samples to the next available UI to ensure no loss of data and put extra data out continuously (in case of data rate faster than clock rate) or wait for extra cycle (in case of data rate slower than clock rate) to ensure no data is dropped or no garbage data is provided.

Blocks:

  • Phase detector
  • Delay line tracking
  • Overflow underflow filter
  • Data path

Phase Detector

The samples from master delay line and slave delay line are fed into alexander bang bang phase detector circuit to determine if delay line value should be increased or decreased. For each UI, 2 samples are taken from each bitslice. Depending on clock is early or clock is late, we increment delay or decrement delay, respectively.
Figure 2. PCLK and NCLK Sampling Same Data

In the above image, both pclk and nclk are sampling the same UI, hence increment the delay value.

Figure 3. PCLK and NCLK Sampling Different Data

In the above figure, both pclk and nclk are sampling different UI, hence decrement the delay value.

Hence, the rules are: if X=D, increment and if X!=D, decrement delays. For master bitslice, consider P data to be X and N data to be D. Automatically, master bitslice is N centered. For slave bitslice, consider N data to be X and P data to be D. Automatically, slave bitslice is P centered.

Delay Line Tracking

In this module, we keep track of delay values of bitslice. Depending on phase detector output, delay line values are updated after certain no of cycles (loop bandwidth to see updated outcome of the previous decision). Once, the respective D samples received from PHY are in the center of UI (will see dithering effect of count value of delay line) the particular bitslice is considered lock until it reaches delay line boundaries.

Once the boundary of delay line is reached, overflow or underflow signals are generated to inform data path that delay line cannot track the particular UI anymore and need to move to next available UI. This is done by overriding the INC/DEC decision from phase detector till the bitslice reach the next available UI. During this period, lock signal is pulled down till we can start tracking next UI again.

Overflow Underflow Filter

This module is used to generate single pulse of overflow or underflow condition for data path from multiple bursts of underflow or overflow signals generated by delay line tracking logic. For long time, delay line tracking logic generates overflow underflow signal and even without going to next UI, these signals toggle because of the drift, PPM difference between transmit clock and receive clock. Hence, filtration mechanism is required to generate single pulse overflow or underflow.

Data Path

This module is responsible for correctly selecting the data from PHY and providing it to the output. Once, both the delay lines are locked, they are naturally Ā½ UI apart since 1 bitslice is PCLK centered and other 1 is NCLK centered. When the lock happens, depending on which delay line is consuming less delay becomes the active bitslice (data is given from that particular bitslice). Other bitslice is monitor and is always Ā½ UI ahead or behind depending on drift. When overflow or underflow happens for the active bitslice, you switch to the monitor bitslice for data whereas if overflow or underflow happens for the monitor, simply update its reference pointer (D_loc).

D_loc is used as reference pointer for both active and monitor bitslices to ensure there is no loss of UI ever when switching from active bitslice to monitor. Both master and slave 4 bit D-data is stored in 12-bit shift register. D_loc can be from 0-8. 0 representing output data as 0:3 whereas 8 representing 8:11. Then this data is stored in a buffer which is 2x8 in size. Whenever 8-bit data is available in buffer, we put out the data and assert DATA_VALID with RX_DATA. When we move from 0 to 4 in underflow condition, we do not have data to put out, and hence we see 2 cycle gaps in DATA_VALID, whereas, when we move from 8 to 4, we have extra data to put out and we see continuous DATA_VALID. In all other scenarios, we see alternate assertion of DATA_VALID.

Debug control signals are enabled when ENABLE_CDR_DEBUG parameter is turned on. This has bitwise access to internal signals of CDR.

Figure 4. Data Rate is Slower than Receive Clock

Figure 5. Data Rate is Faster than Receive Clock

Figure 6. Block Diagram for CDR

CDR with Zero PPM

In this case, TX and RX clock must be generated by the same source. Besides this, the I/O pins can be Single-Ended or differential. The CDR algorithm is different for Single-Ended and differential IO pins. This is described in the following sections.

CDR for Single-Ended IO Design

The purpose of the CDR block is to ensure that UI sampling is always at the center for asynchronous signals. The sampling of UI is performed at the same frequency as the data rate. For example, if the interface speed is 1250Mb/s then the PLL clock frequency should be 1250MHz. Thus, you are sampling each UI twice: Once in the center of the UI and once at the edge of UI. The sample from the center of the UI is valid data and the edge sample is used to keep the clock in the center of the data by updating the delay line. The block diagram of CDR block is shown in the following figure.

Figure 7. Block Diagram of CDR Implementation for Single-Ended IO Designs

The samples from the delay line are fed into the phase detector circuit to determine if the delay line value should be increased or decreased. For each UI, two samples are taken from each bitslice. Depending on whether the clock is early or late, the delay is incremented or decremented.

Depending on the phase detector output, delay line values are updated after a certain number of cycles. Once the respective D samples received from the PHY are in the centre of the UI, the particular bitslice is considered locked. Once the bitslice is locked, of the eight bits given by PHY, four bits are given to the RX gearbox. Among the eight bits, four bits are selected depending on whether the data is N centered or P centered.

CDR for Differential IO Design

In the differential case, two lane outputs (both P and N) from the PHY are fed to the CDR block. The sampling of the UI is done at half the frequency of the data rate. For example, if the interface speed is 1250 Mb/s then the PLL clock frequency should be 625 MHz. Thus, you are sampling each UI only once on one lane. The same UI is sampled again on another lane. The block diagram of the CDR block for differential IOs is shown in the following figure.

Figure 8. Block Diagram of CDR Implementation for Differential IO Designs

For differential IOs, the CDR algorithm is different from single-ended IOs. Here two lanes are fed into the Alexander bang bang detector as opposed to the single-ended algorithm. For the Alexander bang bang detector to work, one lane should be edge aligned and other lane should be center aligned. This is achieved according to the following flow chart.

Figure 9. Flow Chart for Centering the ā€˜Nā€™ Lane

As described in the following flow chart, P lane is center-aligned and N lane is edge-aligned. After this, both the bitslices are considered as locked and both lanes are fed to the Alexander Bang Bang Detector for VT tracking. Depending on whether the clock is early or late, the delay is incremented or decrement for both the lanes (P and N) respectively. Once the bitslices are locked, the 8-bit N channel output is given to the gearbox.

Gearbox

Gearbox is used to convert data from one width to other, for example, converting an 8-bit data to a 10-bit data.