4:2:0 - 2.4 English

Video Processing Subsystem Product Guide (PG231)

Document ID
PG231
Release Date
2024-02-21
Version
2.4 English

4:2:0 encoding contains horizontally and vertically sub-sampled chroma. Horizontal and vertical chroma positions are co-sited with alternate luma samples on alternate scanlines. The sampling positions are shown in the following figure.

Figure 1. YUV 4:2:0 Format
Implementation
Between the three supported sub-sampling formats (4:4:4, 4:2:2, 4:2:0), there are six conversions available. Conversion is achieved using a FIR filter approach. Some require filtering in only the horizontal dimension or in only the vertical dimension, and in some cases in both the horizontal and the vertical dimensions. These are detailed in the following table along with default filter information.
Table 1. Chroma Resampling Configuration
Converter Filter Configuration
4:4:4 to 4:2:2 Horizontal anti-aliasing
4:4:4 to 4:2:0 Separable 2-D anti-aliasing
4:2:2 to 4:4:4 Horizontal Interpolation
4:2:2 to 4:2:0 Vertical anti-aliasing
4:2:0 to 4:4:4 Separable 2-D Interpolation
4:2:0 to 4:2:2 Vertical Interpolation

Three implementation options are offered for each conversion operation:

  • DSP48 based filter with programmable coefficients and programmable number of taps. 2D filters must be separable. Coefficients are in the range [-8, 8), represented in 16-bit signed, fixed-point format with four integer bits and 12 fractional bits.
  • The predefined fixed coefficient, non-programmable filter with power of two coefficients (using only shifts and additions for filtering therefore no DSP48s are used). Default coefficients implement linear interpolation for the interpolation and anti-aliasing low pass filters.
  • The simplest, lowest footprint solution is to simply drop (decimation) or replicate (interpolation) samples. For down sampling, some samples are passed directly to the output, but others are dropped entirely as appropriate. For up converters, replication of the previous input sample occurs.
Convert 4:2:2 to 4:4:4
This conversion is a 1:2 horizontal interpolation operation, implemented using a two-phase polyphase FIR filter. One of the two output pixels is co-sited with one of the input sample. The ideal output is achieved simply by replicating this input sample.

To evaluate output pixel ox, y, the FIR filter convolves COEFk_HPHASEpx, where k is the coefficient index, ix,y are pixels from the input image, px is the interpolation phase (0 or 1, depending on x) and [ ]M m represents rounding with clipping at M, and clamping at m. DW is the Data Width or number of bits per video component. Ntaps is the number of filter taps.

Figure 2. J 23.933118309859154 o x , y   = k = 0 N t a p s - 1 i x - k ,   y   C O E F k _ H P H A S E p X 0 2 D W - 1  
In phase 1, COEF00_HPHASE1 is the coefficient applied to the most recent input sample in the filter aperture. The following figure illustrates coefficient use for a four tap filter example, with simplified nomenclature a= COEF00_HPHASE1, b= COEF01_HPHASE1, c=COEF02_HPHASE1, and d= COEF03_HPHASE1.
Figure 3. 4:2:2 to 4:4:4 Coefficient Configuration
The predefined filters replicate the input sample for Phase 0. The Phase 1 filter is [0.5 0.5].
Convert 4:4:4 to 4:2:2
This conversion is a horizontal 2:1 decimation operation, implemented using a low-pass FIR filter to suppress chroma aliasing. In order to evaluate output pixel ox, y, the FIR filter in the core convolves COEFk_HPHASE0, where k is the coefficient index, ix,y are pixels from the input image, and [ ]M m represents rounding with clipping at M, and clamping at m. DW is the Data Width or number of bits per video component. Ntaps is the number of filter taps.
Figure 4. K 23.933118309859154 o x , y   = k = 0 N t a p s - 1 i x - k ,   y   C O E F k _ H P H A S E 0 0 2 D W - 1
In phase 0, COEF00_HPHASE0 is the coefficient applied to the most recent input sample in the filter. Figure YUV 4:2:2 Format illustrates coefficient use for a 5 tap filter example, with simplified nomenclature a= COEF00_HPHASE0, b= COEF01_HPHASE0, c= COEF02_HPHASE0, d=COEF03_HPHASE0, and e= COEF04_HPHASE0.
Figure 5. 4:4:4 to 4:2:2 Coefficient Configuration
The predefined filter coefficients are [0.25 0.5 0.25].
Convert 4:2:0 to 4:2:2
This conversion is a 1:2 vertical interpolation operation, implemented using a 2-phase polyphase FIR filter. One of the two output pixels is co-sited with one of the input sample. The ideal output is achieved simply by replicating this input sample.
To evaluate output pixel ox,y, the FIR filter in the core convolves COEFk_VPHASEpy, where k is the coefficient index, py is the interpolation phase, ix,y are pixels from the input image, and [ ]M m represents rounding with clipping at M, and clamping at m. DW is the Data Width or number of bits per video component. Ntaps is the number of filter taps.
Figure 6. L 23.933118309859154 o x , y   = k = 0 N t a p s - 1 i x , y - k   C O E F k _ V P H A S E 0 0 2 D W - 1
In phase 1, COEF00_VPHASE1 is the coefficient applied to the most recent input sample in the filter. The following figure illustrates coefficient use for a four tap filter example, with simplified nomenclature a= COEF00_VPHASE1, b= COEF01_VPHASE1, c= COEF02_VPHASE1, and d= COEF03_VPHASE1.
Figure 7. 4:2:0 to 4:2:2 Coefficient Configuration
The predefined filters use the coefficients [0.5 0.5] to interpolate one of the output samples. The other output sample is a replication of the input sample.
Convert 4:2:2 to 4:2:0
This conversion is a vertical 2:1 decimation operation, implemented using a low-pass FIR filter to suppress chroma aliasing. In order to evaluate output pixel ox,y, the FIR filter in the core convolves COEFk_VPHASE0, where k is the coefficient index, ix,y are pixels from the input image, and [ ]M m represents rounding with clipping at M, and clamping at m. DW is the Data Width or number of bits per video component. Ntaps is the number of filter taps.
Figure 8. M 23.933118309859154 o x , y   = k = 0 N t a p s - 1 i x - k ,   y   C O E F k _ V P H A S E 0 0 2 D W - 1
In phase 0, COEF00_VPHASE0 is the coefficient applied to the most recent input sample in the filter. The following figure illustrates coefficient use for a five tap filter example, with simplified nomenclature a= COEF00_VPHASE0, b= COEF01_VPHASE0, c= COEF02_VPHASE0, d= COEF03_VPHASE0, and e= COEF04_VPHASE0.
Figure 9. 4:2:2 to 4:2:0 Coefficient Configuration
The predefined filter coefficients are [0.25 0.5 0.25].
Convert 4:2:0 to 4:4:4
This conversion performs interpolation both vertically and horizontally. This is equivalent to a 2D separable filter implemented by cascading the 4:2:0 to 4:2:2 block and the 4:2:2 to 4:4:4 block. Quantized vertical filter results are filtered by the horizontal filter, which in turn quantizes results back to the [0 .. 2DW-1] range. (DW is the Data Width or number of bits per video component.)

Intermediate 4:2:2 chroma values are computed using Figure 8. The resulting computation is shown in the following equation.

Figure 10. N 24.361118309859158 t x , y   = k = 0 N V t a p s - 1 t x , y - k   C O E F k _ V P H A S E 0 0 2 D W - 1

Next, the values are filtered according to Figure 1. The resulting computation is shown in the following equation.

Figure 11. O 23.933118309859154 o x , y   = k = 0 N t a p s - 1 i x - k ,   y   C O E F k _ H P H A S E p X 0 2 D W - 1
The predefined filter coefficients are the same as defined in Convert 4:2:0 to 4:2:2 and Convert 4:2:2 to 4:4:4. In the vertical direction, one input sample is replicated, and the other is interpolated with the filter [0.5 0.5]. The same then happens in the horizontal direction.
Convert 4:4:4 to 4:2:0
This conversion performs decimation by 2 both vertically and horizontally. This is equivalent to a 2D separable filter implemented by cascading the 4:4:4 to 4:2:2 block and the 4:2:2 to 4:2:0 block. Quantized horizontal filter results are filtered by the vertical filter, which in turn quantizes results back to the [0 .. 2DW-1] range. (DW is the Data Width or number of bits per video component.)

Intermediate 4:2:2 chroma values are computed using the Figure 3. The resulting computation is shown in the following equation.

Figure 12. P 24.361118309859158 t x , y   = k = 0 N H t a p s - 1 i x - k ,   y   C O E F k _ H P H A S E 0 0 2 D W - 1

Next, these values are filtered according to Figure 1. The resulting computation is shown in the following equation.

Figure 13. Q
The predefined filter coefficients are the same as defined in Convert 4:4:4 to 4:2:2 and Convert 4:2:2 to 4:2:0. In the horizontal direction, decimation is performed with the filter [0.25 0.5 0.25]. The same then happens in the vertical direction.
Resampling Filters
The upsampling and downsampling performed during the chroma format conversion is implemented with low pass filters for the interpolation and anti-aliasing.

The chroma resampling function offers a horizontal filter with a maximum of 10 taps and two phases, as well as a vertical filter with a maximum of 10 taps and two phases. For conversions requiring up/down sampling in both horizontal and vertical directions, 2D separable filters are offered.

The number of taps selected must be even (4, 6, 8, or 10). Depending on the conversion type and filter size selected, a subset of the coefficients can be used by setting the unnecessary coefficients to zero.

Each coefficient has 16 bits in 2's complement format: 4 integer bits (one sign bit) and 12 fractional bits. The sign bit is the MSB. For example, a coefficient with a value of 1 is represented with this bit vector
[0001000000000000].

The coefficients should sum to exactly 1 to achieve unity gain. If they sum to less than 1, some loss of dynamic range is observed.

Computation Bit Width Growth
Full precision (DATA_WIDTH+16+log2(NTaps) bits) is maintained during the horizontal and/ or vertical FIR convolution operation.

FIR filter outputs are rounded to DATA_WIDTH bits by adding half an output LSB in the full precision domain prior to truncation. Clipping and clamping of the output data prevents overflows and underflows. Data is clipped and clamped at 2DATA_WIDTH-1 and 0.

Edge Padding
The edge pixels of images are replicated prior to filtering to avoid image artifacts.
Note: Configure input and output resolutions of the Video Processing Subsystem IP (all modes of the core) to enable the bypass mode.