Performance - 3.3 English

40G/50G High Speed Ethernet Subsystem v3.3 Product Guide (PG211)

Document ID

PG211

Release Date

2023-11-01

Version

3.3 English

Performance of the timestamping logic is tested as shown in the previously referenced diagrams. On the RX side, the jitter test circuit takes note of the system timer at exactly the time when a start-of-frame packet enters the test circuit. Some time later, the timestamp is captured by the RX PCS and is eventually output on the system side AXI4-Stream interface (rx_ptp_tstamp_out[79:0]). The variation of the difference between these two time captures, dt, is defined as the "jitter" performance of the timestamping logic. The TX test is similar for tx_ptp_tstamp_out[79:0].

The 40G/50G subsystem timestamping logic is theoretically capable of determining the time of crossing the SOP capture plane to within the granularity of the 80-bit system timer input. Therefore, if the system timer has a 1 nsec period, the timestamp is accurate to within a jitter of 1 nsec. 1 nsec is also the granularity of the least significant bit of the 80-bit field as defined by IEEE 1588.

In practice, additional factors limit the accuracy achievable in a real system.

Clock Domain

In a practical sense, the system timer input is required to have a granularity of the SerDes clock. Therefore, the clock domain crossing of the system timer input should be taken into account. For example, if the SerDes clock has a frequency of 390 MHz, the system timer has an actual granularity of 2.56 ns, which is also the clock which captures the timestamp. Hence an additional variation of 2.56 ns can be expected.

Transceiver

The addition of a SerDes in the datapath does not impact the jitter performance of the 40G/ 50G subsystem but might result in asymmetry.

In a 1588 clock application, the RX + TX SerDes latency becomes part of the loop delay and is therefore measured by the 1588 protocol. For maximum accuracy of the slave clock it is desirable to take loop asymmetry into account (the difference between the RX and TX SerDes latencies). AMD can provide the transceiver latency for various settings of the SerDes specific to your device. You need to contact the vendor of the other transceiver in a datapath if necessary for its characteristics, if you wish to take asymmetry into account in your PTP system.

AMD UltraScale™ and AMD UltraScale+™ transceivers have the ability to report the RX latency. Variation of the RX transceiver latency is mainly due to the fill level of its internal elastic buffer. Refer to the transceiver guide for more details.

Forward Error Correction

Forward Error Correction (for both Clause 74 and Clause 91) takes place on the line side of the timestamp capture. Therefore, the addition of FEC does not impact the accuracy of the timestamp capture in a 40G/50G subsystem. Similar to the SerDes case discussed previously, the additional total (RX + TX) latency of the FEC is measured by the 1588 protocol.

Note: The SOP is not visible in a transmitted FEC frame until it has been decoded by the RX FEC function.)

For maximum 1588 slave clock accuracy, it is useful to know the asymmetry of the FEC latency in the RX and TX directions. Contact support for RX and TX latency of the specific FEC and its configuration. You might also need to obtain this information from the vendor of the link partner in the PTP system if it is not an FEC implementation.

Receive Skew Correction

In a multi-lane system such as 40G and 50G, the packet corresponding to the SOP can occur on any lane. Furthermore, lanes can have skew relative to each other. The 40G/50G subsystem provides the ability to take the arrival lane of an SOP frame into account by reporting the SOP lane and its skew. Correction can be performed by hardware or software. The recommended steps are as follows.

Consider the example cases below, for 40G and 50G. The procedure is the same for both except that the number of PMD lanes is 4 and 2 respectively.

Figure 1. 40G Fill Level Correction Example

Figure 2. 50G Fill Level Correction Example

Figure 3. 40G Skew Correction Example

Figure 4. 50G Skew Correction Example

Figure 5. Timestamp Skew Correction Logic

The first step is to take a time average of the alignment buffer fill levels because the granularity of these signals is one SerDes clock cycle. While the skew remains relatively constant over time (for example, minutes or hours), the alignment buffer levels have short-term fluctuations of SerDes clock cycles due to sampling quantization. Therefore the actual skew can be obtained to a high degree of accuracy (for example, sub nanoseconds) by taking a time average of each of the fill levels.

Assuming the fill levels are accurately determined as above, the following formula is used to correct for skew:

correction = ((RX_LANE_ALIGNER_FILL_n) - (RX_LANE_ALIGNER_FILL_0)) * SerDes clock period
corrected timestamp = RX_PTP_TSTAMP_OUT + correction

The corrected timestamp is the skew-corrected timestamp, which is required to be kept in step with the corresponding packet data

RX_PTP_TSTAMP_OUT is the captured timestamp.

RX_LANE_ALIGNER_FILL_0 is the alignment buffer fill level for the lane on which the timestamp was taken, usually lane 0 (check with AMD technical sales support for updates).

RX_LANE_ALIGNER_FILL_n is the alignment buffer fill level for the lane containing the SOF.

The units of all the calculations need to be consistent. Because fill levels are provided in terms of clock cycles, they might have to be converted to nanoseconds or whatever units are consistent with the calculation.

For additional information, see the IEEE Standard 1588-2008, "IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems".