Video Buffer Format - 2022.2 English

Zynq UltraScale+ MPSoC ZCU106 Video Codec Unit Targeted Reference Design User Guide (UG1250)

Document ID
Release Date
2022.2 English

The TRD uses two layers (or planes) for DisplayPort TX and up to eight layers for the HDMI TX Subsystem. These layers get alpha-blended inside the display subsystem, which sends a single video stream to the DisplayPort controller or HDMI Transmitter Subsystem. The bottom layer is used for video frames and the top layer is used for graphics. The graphics layer consists of the GUI and is rendered by the GPU. It overlays certain areas of the video frame with GUI control elements while other parts of the frame are transparent. A mechanism called pixel alpha is used to control the opacity of each pixel in the graphics plane.

The pixel format used for the graphics plane is called ARGB8888 or AR24. It is a packed format that uses 32 bits to store the data value of one pixel (32 bits per pixel or BPP), 8 bits per component (BPC) —also called color depth or bit depth. The individual components are: alpha value (A), red color (R), green color (G), blue color (B). The alpha component describes the opacity of the pixel: An alpha value of 0% means the pixel is fully transparent (invisible); an alpha value of 100% means the pixel is fully opaque.

The pixel formats used for the video plane are NV12, NV16, XV15 and XV20. These are two-plane versions of the YUV 4:2:0 and YUV 4:2:2 format, respectively. The three components are separated into two sub-images or planes.

In NV12 and XV15 formats, chroma planes are sub-sampled in both the horizontal and vertical dimensions by a factor of 2. That is to say, for a 2x2 square of pixels, there are 4 Y samples but only 1 U sample and 1 V sample. Bit-depth for each sample is 8-bit for NV12 and 10-bit for XV15. The Y plane is first in memory. A combined CbCr plane immediately follows the Y plane in memory.

In NV16 and XV20 formats, chroma planes are sub-sampled only in the horizontal dimension by a factor of 2. Thus, there is the same amount of lines in chroma planes as in the luma plane. For a 2x2 group of pixels, there are 4 Y samples and 2 U and 2 V samples each. Bit-depth for each sample is 8-bit for NV16 and 10-bit for XV20. The Y plane is first in memory. A combined CbCr plane immediately follows the Y plane in memory. The CbCr plane is the same width and height, in bytes, as the Y plane.

Aside from the pixel format, a video buffer is further described by a number of other parameters (see This Figure). For this design, the relevant parameters are width, height, and stride as the PS display pipeline does not allow for setting an x or y offset.

Figure 4-2:      Video Buffer Area

X-Ref Target - Figure 4-2


The active area is the part of the video buffer that is visible on the screen. The active area is defined by the height and width parameters, also called the video dimensions. Those are typically expressed in number of pixels because the bits per pixel depend on the pixel format as explained above.

The stride or pitch is the number of bytes from the first pixel of a line to the first pixel of the next line of video. In the simplest case, the stride equals the width multiplied by the bits per pixel, converted to bytes. For example, AR24 requires 32 BPP which is four bytes per pixel. A video buffer with an active area of 1920 x 1080 pixels therefore has a stride of 4 x 1920 = 7,680 bytes. Some DMA engines require the stride to be a power of two to optimize memory accesses. In this design, the stride always equals the width in bytes.