Sequential Loops - 2021.2 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
English (United States)
Release Date
2021.2 English
If there are multiple loops in the design, by default they do not overlap, and execute sequentially. This section introduces the concept of dataflow optimization for sequential loops. Consider the following code example:
void adder(unsigned int *in, unsigned int *out, int inc, int size) {

  unsigned int in_internal[MAX_SIZE];
  unsigned int out_internal[MAX_SIZE];
  mem_rd: for (int i = 0 ; i < size ; i++){
    #pragma HLS PIPELINE
    // Reading from the input vector "in" and saving to internal variable
    in_internal[i] = in[i];
  compute: for (int i=0; i<size; i++) {
  #pragma HLS PIPELINE
    out_internal[i] = in_internal[i] + inc;

  mem_wr: for(int i=0; i<size; i++) {
  #pragma HLS PIPELINE
    out[i] = out_internal[i];

In the previous example, three sequential loops are shown: mem_rd, compute, and mem_wr.

  • The mem_rd loop reads input vector data from the memory interface and stores it in internal storage.
  • The main compute loop reads from the internal storage and performs an increment operation and saves the result to another internal storage.
  • The mem_wr loop writes the data back to memory from the internal storage.

This code example is using two separate loops for reading and writing from/to the memory input/output interfaces to infer burst read/write.

By default, these loops are executed sequentially without any overlap. First, the mem_rd loop finishes reading all the input data before the compute loop starts its operation. Similarly, the compute loop finishes processing the data before the mem_wr loop starts to write the data. However, the execution of these loops can be overlapped, allowing the compute (or mem_wr) loop to start as soon as there is enough data available to feed its operation, before the mem_rd (or compute) loop has finished processing its data.

The loop execution can be overlapped using dataflow optimization as described in Dataflow Optimization.