Accelerator - 2020.2 English

The accelerator GStreamer plugins are designed to implement memory-to-memory functions that can easily and seamlessly interface with other GStreamer elements such as video sources and sinks. The following figure shows the general architecture of accelerator plugins. The gray-colored boxes are components developed by Xilinx whereas the white boxes are open-source components.

Figure 1. Gstreamer Plugin Architecture

An accelerator element has one source and one sink pad; it can consume N temporal input frames from its source pad and produce one output frame on its sink pad. All accelerator plugins inherit from the generic base class which in turn inherits from the GStreamer video transform class. The base class provides common infrastructure that is shared across all accelerators. It also provides a generic filter-mode property which allows the user to switch between a hardware-accelerated version of the algorithm or a pure software implementation. Note that it is not mandatory for accelerator plugins to implement both modes. Accelerator plugins can implement additional accelerator-specific properties. The allocator class wraps the low-level memory allocation and dmabuf routines. The plugins launch the PL-based kernel or Data movers generated by the Vitis software platform.

The PL-based kernel uses the Xilinx Vitis Vision libraries. These libraries provide hardware-optimized implementations of a subset of the OpenCV libraries. They are implemented in C-code that is then synthesized to PL using high level synthesis (HLS).

Xilinx Vitis Vision libraries:
- https://github.com/Xilinx/Vitis_Libraries/tree/master/vision/
- https://xilinx.github.io/Vitis_Libraries/vision/

The AIE-based kernel uses Xilinx AIE engine intrinsic calls. The AI engine program is implemented in C-code that is then synthesized to target AI engines using aiecompiler. A data mover implemented in C-code is synthesized to PL using high level synthesis (HLS) which transfers data to/from the AI engine.

The XRT and hls libraries are used for memory allocation as well as memory and hardware interface generation.

Filter 2D Plugin

In this example, a 2D convolution filter is implemented in three different versions:

A software implementation using the OpenCV library
A PL implementation using the Xilinx Vitis Vision library
- https://xilinx.github.io/Vitis_Libraries/vision/
An AIE implementation based on the Versal ACAP AI Engine Programming Environment User Guide (UG1076). The AIE implementation also needs a data mover in the PL that the plugin configures.

The kernel implements a transform function that takes an input frame and produces an output frame. It also exports an interface that allows the user to program the kernel coefficients (not available in the AIE implementation).

The PL based implementation uses three hardware-accelerated functions to process the video:

The first function read_f2d_input, extracts the luma component from the input image to prepare it for the next, main processing step that operates on luma only.
The main processing function filter2d_sd uses the Vitis Vision function filter2D with a 3x3 window size, and a maximum resolution of 3840x2160.
- https://xilinx.github.io/Vitis_Libraries/vision/api-reference.html#vitis-vision-library-functions
- https://github.com/Xilinx/Vitis_Libraries/blob/master/vision/L1/include/imgproc/xf_custom_convolution.hpp
As final step, the write_f2d_output function merges the unmodified UV component with the modified luma component from the main processing function.

The AIE based implementation also uses three hardware-accelerated functions to process the video:

The first function read_f2d_input, extracts the luma component from the input image to prepare it for the next, main processing step that operates on luma only. The luma component is streamed into an AI engine.
The AI engine performs the main processing function with a 3x3 window size, and a fixed resolution of 720x1280 and stream outs the processed data to the data mover.
As final step, the write_f2d_output function merges the unmodified UV component with the modified luma component from the main processing function.