Pipelining for Throughput - 2023.2 English

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID

XD099

Release Date

2023-11-13

Version

2023.2 English

High-level synthesis can be very conservative by default, for example, loop body instructions are entirely executed at each iteration instead of executing in a staggered fashion. That latter style of execution is explicitely enabled by the PIPELINE pragma, it then reduces the II for a function or loop (here in this tutorial, it is applied on loops) by allowing the concurrent execution of the different operations. A pipelined function or loop can then process new inputs every N clock cycles, where N is the II of the loop or function. The default II for the PIPELINE pragma is 1, which processes a new input every clock cycle. You can also specify the initiation interval through the use of the II option.

Pipelining a loop allows its operations to be implemented so that these operations execute concurrently as shown in the following animated figure. In that example and by default, there are three clock cycles between each input read (so II=3), and it requires 12 clock cycles fully execute the loop compared to 6 when the pragma is used.

Pipeline

If the Vitis high-level synthesis tool cannot create a design with the user-specified II, it issues a warning and creates a design with the lowest achievable II. You can then analyze this design with the warning message to determine what steps must be taken to create a design that satisfies the required initiation interval.

To enable the pragma in the C source, insert it within the body of the function or loop.

 #pragma HLS pipeline II=<int> enable_flush rewind

II=int specifies the desired number of clock cycles between each II for the pipeline. The HLS tool tries to meet this request but based on data dependencies, the actual result might have a larger initiation interval. The enable_flush modifier is optional and keeps on pushing data out if the data valid at the input of the pipeline goes inactive. The rewind modifier is also optional and enables continuous loop pipelining with no pause between one loop iteration ending and the next iteration starting. Rewinding is effective only if there is one single loop (or a perfect loop nest) inside the top-level function.

In the following example function, foo, is pipelined with an II of 1:

void foo { a, b, c, d} {
  #pragma HLS pipeline II=1
  ...
}

Take a look at the kernel source code and notice how the PIPELINE pragma/directive is applied for several loops in the code. Since Vitis HLS automatically pipelines the most inner loops, the results will not be different compared to what was seen in the previous module (the baseline).