Using Burst Data Transfers - 2022.1 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
Release Date
2022.1 English

Transferring data in bursts hides the memory access latency and improves bandwidth usage and efficiency of the memory controller. Also, check the HLS report for bursting information.

If burst data transfers occur, the detailed kernel trace will reflect the higher burst rate as a larger burst length number:

Figure 1. Burst Data Transfer with Detailed Kernel Trace

In the previous figure, it is also possible to observe that the memory data transfers following the AXI interconnect are actually implemented rather differently (shorter transaction time). Hover over these transactions, you would see that the AXI interconnect has packed the 16 x 4 byte transaction into a single package transaction of 1 x 64 bytes. This effectively uses the AXI4 bandwidth which is even more favorable. The next section focuses on this optimization technique in more detail.

Burst inference is heavily dependent on coding style and access pattern. However, you can ease burst detection and improve performance by isolating data transfer and computation, as shown in the following code snippet:

void kernel(T in[1024], T out[1024]) {
    T tmpIn[1024];
    T tmpOu[1024];
    read(in, tmpIn);
    process(tmpIn, tmpOut);
    write(tmpOut, out);

In short, the function read is responsible for reading from the AXI input to an internal variable (tmpIn). The computation is implemented by the function process working on the internal variables tmpIn and tmpOut. The function write takes the produced output and writes to the AXI output. For more information on burst, see the AXI Burst Transfers in the Vitis HLS User Guide (UG1399).

The isolation of the read and write function from the computation results in:

  • Simple control structures (loops) in the read/write function which makes burst detection simpler.
  • The isolation of the computational function away from the AXI interfaces, simplifies potential kernel optimization. See Optimizing C++ Kernels for more information.
  • The internal variables are mapped to on-chip memory, which allow faster access compared to AXI transactions. Acceleration platforms supported in the Vitis core development kit can have as much as 10 MB on-chip memories that can be used as pipes, local memories, and private memories. Using these resources effectively can greatly improve the efficiency and performance of your applications.