Video Decompression - 1.0 English

Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller 1.0 LogiCORE IP Product Guide (PG313)

Document ID
PG313
Release Date
2023-11-01
Version
1.0 English

Video compression standards often use block-based motion estimation. To decode a 16x16 pixel block in the current frame, the decoder may receive a motion vector relative to a past or future reference frame from which the best approximation of the current block can be fetched. As a result the majority of DRAM access done by a video decoder is in the form of rectangular block fetch from an image frame buffer, with arbitrary offset or alignment. The result depends on the speed and direction of the motion in the video.

To maximize efficiency of video decode traffic, the mapping should shape DRAM pages into rectangular image regions rather than ‘raster’ image lines. An optimal mapping has the following properties:
  • DRAM pages are shaped into rectangular image regions, referred to as tiles.
  • Each tile is large enough to maximize the probability of a block fetch being fully contained within the tile.
  • Adjacent tiles in any image direction (left, right, top, bottom) belong to alternate bank groups.

The following figure illustrates video decode frame buffer tiling. In the buffer, 50 8x8 blocks are randomly placed to model a video decoder motion-compensated block fetch.

Figure 1. Video Decode Frame Buffer Tiling

It can be seen that most blocks fall within a single tile and therefore exhibit very efficient DRAM access. Some blocks span two tiles either horizontally or vertically, and exhibit somewhat reduced efficiency. On rare occasions, a block may fall on a corner of four tiles and will exhibit yet lower efficiency.

Note that such mapping imposes restrictions on how the video frame buffer is allocated and used. Key requirements include:
  • The frame buffer start address must be page-aligned.
  • If the video decode line size is not a power-of-2 value (as in HD: 1920), the line length is rounded up to the next power-of-2 value (2048 in this example), and therefore some portion of the line is unused. This is referred to as Line Stride, being larger that the line length.
The results shown in the following table demonstrate video decode traffic sensitivity to address mapping. The optimal mapping is highlighted in yellow.
Table 1. Video Decompression Address Mapping
Address Mapping Efficiency [%]
BG-optimized RBC (16R-2B-1BG-7C-1BG-3C 64
RBC (16R-2B-2BG-10C) 64
D4_64t4k (14R-2B-5C-1BG-3R-2C-1BG-3C) 75
D4_64t2k (15R-2B-5C-1BG-2R-2C-1BG-3C) 58
RCB (16R-7C-2B-2BG-3C) 40