Flattening Nested Loops to Improve Latency

Flattening Nested Loops to Improve Latency - 2022.1 English

Vitis High-Level Synthesis User Guide (UG1399)

Document ID

UG1399

Release Date

2022-06-07

Version

2022.1 English

In a similar manner to the consecutive loops discussed in the previous section, it requires additional clock cycles to move between rolled nested loops. It requires one clock cycle to move from an outer loop to an inner loop and from an inner loop to an outer loop.

In the small example shown here, this implies 200 extra clock cycles to execute loop Outer.

void foo_top { a, b, c, d} {
 ...
 Outer: while(j<100)
 Inner: while(i<6) // 1 cycle to enter inner
 ...
 LOOP_BODY
 ...
 } // 1 cycle to exit inner
 }
 ...
}

Vitis HLS provides the set_directive_loop_flatten command to allow labeled perfect and semi-perfect nested loops to be flattened, removing the need to re-code for optimal hardware performance and reducing the number of cycles it takes to perform the operations in the loop.

Perfect loop nest: Only the innermost loop has loop body content, there is no logic specified between the loop statements and all the loop bounds are constant.
Semi-perfect loop nest: Only the innermost loop has loop body content, there is no logic specified between the loop statements but the outermost loop bound can be a variable.

For imperfect loop nests, where the inner loop has variables bounds or the loop body is not exclusively inside the inner loop, designers should try to restructure the code, or unroll the loops in the loop body to create a perfect loop nest.

When the directive is applied to a set of nested loops it should be applied to the inner most loop that contains the loop body.

set_directive_loop_flatten top/Inner

Loop flattening can also be performed using the directive tab in the IDE, either by applying it to individual loops or applying it to all loops in a function by applying the directive at the function level.