Loop Parallelism - 2022.1 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
Release Date
2022.1 English

Loops are the basic C/C++/OpenCL API method of representing repetitive algorithmic code. The following example illustrates various implementation aspects of a loop structure:

  for(int i = 0; i<255; i++) {
    out[i] = in[i]+in[i+1];
  out[255] = in[255];

This code iterates over an array of values and adds consecutive values, except the last value. If this loop is implemented as written, each loop iteration requires two cycles for implementation, which results in a total of 510 cycles for implementation. This can be analyzed in detail through the Schedule Viewer in the HLS Project:

Figure 1. Implemented Loop Structure in Schedule Viewer

This can also be analyzed in terms of total numbers and latency through the Vivado synthesis results:

Figure 2. Synthesis Results Performance Estimates

The key numbers here are the latency numbers and total LUT usage. For example, depending on the configuration, you could get latency of 511 and total LUT usage of 47. As a result, these values can vary based on the implementation choices. While this implementation will require very little area, it results in significant latency.