# Loop Parallelism - 2022.1 English

## Vitis High-Level Synthesis User Guide (UG1399)

Document ID
UG1399
Release Date
2022-06-07
Version
2022.1 English

Vitis HLS schedules logic and functions early as possible to reduce latency while keeping the estimated clock period below the user-specified period. To perform this, it schedules as many logic operations and functions as possible in parallel. It does not schedule loops to execute in parallel.

If the following code example is synthesized, loop `SUM_X` is scheduled and then loop `SUM_Y` is scheduled: even though loop `SUM_Y` does not need to wait for loop `SUM_X` to complete before it can begin its operation, it is scheduled after `SUM_X`.

``````
#include "loop_sequential.h"

void loop_sequential(din_t A[N], din_t B[N], dout_t X[N], dout_t Y[N],
dsel_t xlimit, dsel_t ylimit) {

dout_t X_accum=0;
dout_t Y_accum=0;
int i,j;

SUM_X:for (i=0;i<xlimit; i++) {
X_accum += A[i];
X[i] = X_accum;
}

SUM_Y:for (i=0;i<ylimit; i++) {
Y_accum += B[i];
Y[i] = Y_accum;
}
}
``````

Because the loops have different bounds (`xlimit` and `ylimit`), they cannot be merged. By placing the loops in separate functions, as shown in the following code example, the identical functionality can be achieved and both loops (inside the functions) can be scheduled in parallel.

``````
#include "loop_functions.h"

void sub_func(din_t I[N], dout_t O[N], dsel_t limit) {
int i;
dout_t accum=0;

SUM:for (i=0;i<limit; i++) {
accum += I[i];
O[i] = accum;
}

}

void loop_functions(din_t A[N], din_t B[N], dout_t X[N], dout_t Y[N],
dsel_t xlimit, dsel_t ylimit) {

sub_func(A,X,xlimit);
sub_func(B,Y,ylimit);
}
``````

If the previous example is synthesized, the latency is half the latency of the sequential loops example because the loops (as functions) can now execute in parallel.

The `dataflow` optimization could also be used in the sequential loops example. The principle of capturing loops in functions to exploit parallelism is presented here for cases in which `dataflow` optimization cannot be used. For example, in a larger example, `dataflow` optimization is applied to all loops and functions at the top-level and memories placed between every top-level loop and function.