Introduction

Beamforming Implementation on AI Engine (XAPP1352)

Document ID
XAPP1352
Release Date
2021-01-11
Revision
1.0 English

5G wireless communication systems have enhanced multiple-input-multiple-output (MIMO) technology by employing a larger number of antennas for higher spectral efficiency than that of previous generations ( 3GPP Std TS 38.212 ). In MIMO systems, spatially uncorrelated data streams can be transmitted and received simultaneously in the same spectrum as if the communication channel were layered into many independent subchannels. The following figure shows the beamforming of an orthogonal frequency division multiplex (OFDM) system with four layers and six antennas.

Figure 1. Beamforming in OFDM Systems

A wireless base station is made up of baseband and radio units, and its complexity is proportional to the number of layers and antennas, respectively. The beamforming module is located between the baseband and radio units and its complexity is proportional to the product of the number of layers and antennas. In 5G MIMO systems with a larger number of antennas to support more layers, the complexity of beamforming is as high as 320 times that of 4G LTE, and this has become one of the major challenges to system design.

Table 1. Complexity Comparison Between 4G LTE and 5G NR Carriers
Carrier Type 4G LTE 5G NR
Channel Bandwidth 20 MHz 100 MHz
Number of Antennas 8 64
Number of Layers 2 16
Radio Complexity Normalized to 1 40x
Baseband Complexity Normalized to 1 40x

Beamforming Complexity

Normalized to 1 320x

The Xilinx AI Engine is designed for intensive compute in various applications including, but not limited to, 5G wireless. One AI Engine tile consists of one AI Engine, 32KB data memory, and two DMA engines for automatic data transportation. Every AI Engine is equipped with a vector processor that is capable of 32 real-by-real 16-bit multiply-and-accumulate (MAC) operations in one clock cycle. The memory access unit inside the AI Engine reads 512 bits operands and writes 256 bits computation results every clock cycle to match the capability of the vector processor. In a single Versalâ„¢ AI Core device, there are hundreds of AI Engine tiles interconnected through cascading buses, AXI streams, and shared local memory according to the dataflow defined by the user at compilation time. For more detailed information about AI Engines, see Xilinx AI Engine and Their Applications (WP506).

Figure 2. Block Diagram of an AI Engine Tile

Using traditional programmable devices, 5G NR beamformers are built with thousands of DSPs and tens of thousands of look-up tables (LUTs) and flip-flops (FFs). It can easily take several months to develop such complicated systems. This application note shows that the same functionality can be built on tens of AI Engine tiles with three kernels that can be coded in the C programming language within days. The AI Engine design is guaranteed to run at a minimum of 1 GHz on Versalâ„¢ AI Core devices without the need to worry about timing closure. When the system specification changes, thanks to the scalability of the AI Engine design, only slight modifications to the C code are required.