Unrolling Loops - 2021.1 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2022-03-29
Version
2021.1 English

Unrolling a loop enables the full parallelism of the model to be used. To perform this, mark a loop to be unrolled and the tool will create the implementation with the most parallelism possible. To mark a loop to unroll, an OpenCL loop can be marked with the UNROLL attribute:

__attribute__((opencl_unroll_hint))

Or a C/C++ loop can use the unroll pragma:

#pragma HLS UNROLL

For more information, see Loop Unrolling.

When applied to this specific example, the Schedule Viewer in the HLS Project will be:

Figure 1. Schedule Viewer

The following figure shows the estimated performance:

Figure 2. Performance Estimates

Therefore, the total latency was considerably improved to be 127 cycles and as expected the computational hardware was increased to 4845 LUTs, to perform the same computation in parallel.

However, if you analyze the for-loop, you might ask why this algorithm cannot be implemented in a single cycle, as each addition is completely independent of the previous loop iteration. The reason is the memory interface is used for the variable out. The Vitis core development kit uses dual port memory by default for an array. However, this implies that at most two values can be written to the memory per cycle. Thus to see a fully parallel implementation, you must specify that the variable out should be kept in registers as in this example:

#pragma HLS array_partition variable= out complete dim= 0

For more information, see pragma HLS array_partition .

The results of this transformation can be observed in the following Schedule Viewer:

Figure 3. Transformation Results in Schedule Viewer

The associated estimates are:

Figure 4. Transformation Results Performance Estimates

Accordingly, this code can be implemented as a combinatorial function requiring only a fraction of the cycle to complete.