This section outlines the various optimization techniques you can use to direct Vitis™ HLS to produce a micro-architecture that satisfies the desired performance and area goals. Using Vitis HLS, you can apply different optimization directives to the design, including:
- Pipelining tasks, allowing the next execution of the task to begin before the current execution is complete.
- Specifying a target latency for the completion of functions, loops, and regions.
- Specifying a limit on the number of resources used.
- Overriding the inherent or implied dependencies in the code to permit specific operations. For example, if it is acceptable to discard or ignore the initial data values, such as in a video stream, allow a memory read before write if it results in better performance.
- Specifying the I/O protocol to ensure function arguments can be
connected to other hardware blocks with the same I/O protocol.Note: Vitis HLS automatically determines the I/O protocol used by any sub-functions. You cannot control these ports except to specify whether the port is registered.
It helps to understand the process used to synthesize RTL hardware description from C/C++ source code. The Understanding High-Level Synthesis Scheduling and Binding describes some of the important details of this process to help you better understand how you can optimize for it.
You can add optimization directives directly into the source code as
compiler pragmas using various HLS pragmas, or you can use Tcl
set_directive commands to apply optimization directives in a Tcl script to
be used by a solution during compilation as discussed in Adding Pragmas and Directives. The following table lists the optimization directives
provided by Vitis HLS as either pragma or Tcl
|AGGREGATE||The AGGREGATE pragma is used for grouping all the elements of a struct into a single wide vector to allow all members of the struct to be read and written to simultaneously.|
|ALIAS||The ALIAS pragma enables data dependence analysis in Vitis HLS by defining the distance between multiple pointers accessing the same DRAM buffer.|
|ALLOCATION||Specify a limit for the number of operations, implementations, or functions used. This can force the sharing or hardware resources and may increase latency.|
|ARRAY PARTITION||Partitions large arrays into multiple smaller arrays or into individual registers, to improve access to data and remove block RAM bottlenecks.|
|ARRAY_RESHAPE||Reshape an array from one with many elements to one with greater word-width. Useful for improving block RAM accesses without using more block RAM.|
Define a specific implementation for an operation in the RTL.
Define a specific implementation for a storage element, or memory, in the RTL.
|DATAFLOW||Enables task level pipelining, allowing functions and loops to execute concurrently. Used to optimize throughput and/or latency.|
|DEPENDENCE||Used to provide additional information that can overcome loop-carried dependencies and allow loops to be pipelined (or pipelined with lower intervals).|
|DISAGGREGATE||Break a struct down into its individual elements.|
|EXPRESSION_BALANCE||Allows automatic expression balancing to be turned off.|
|INLINE||Inlines a function, removing function hierarchy at this level. Used to enable logic optimization across function boundaries and improve latency/interval by reducing function call overhead.|
|INTERFACE||Specifies how RTL ports are created from the function description.|
|LATENCY||Allows a minimum and maximum latency constraint to be specified.|
|LOOP_FLATTEN||Allows nested loops to be collapsed into a single loop with improved latency.|
|LOOP_MERGE||Merge consecutive loops to reduce overall latency, increase sharing and improve logic optimization.|
|LOOP_TRIPCOUNT||Used for loops which have variables bounds. Provides an estimate for the loop iteration count. This has no impact on synthesis, only on reporting.|
|OCCURRENCE||Used when pipelining functions or loops, to specify that the code in a location is executed at a lesser rate than the code in the enclosing function or loop.|
|PERFORMANCE||Specify the desired transaction interval for a loop and let the tool to determine the best way to achieve the result.|
|PIPELINE||Reduces the initiation interval by allowing the overlapped execution of operations within a loop or function.|
|PROTOCOL||This commands specifies a region of code, a protocol region, in which no clock operations will be inserted by Vitis HLS unless explicitly specified in the code.|
|RESET||This directive is used to add or remove reset on a specific state variable (global or static).|
|STABLE||Indicates that a variable input or output of a dataflow region can be ignored when generating the synchronizations at entry and exit of the dataflow region.|
|STREAM||Specifies that a specific array is to be implemented as a FIFO or RAM memory channel during dataflow optimization. When using hls::stream, the STREAM optimization directive is used to override the configuration of the hls::stream.|
|TOP||The top-level function for synthesis is specified in the project settings. This directive may be used to specify any function as the top-level for synthesis. This then allows different solutions within the same project to be specified as the top-level function for synthesis without needing to create a new project.|
|UNROLL||Unroll for-loops to create multiple instances of the loop body and its instructions that can then be scheduled independently.|
In addition to the optimization directives, Vitis HLS provides a number of configuration commands that can influence the performance of synthesis results. Details on using configurations commands can be found in Setting Configuration Options. The following table reflects some of these commands.
|Config Array Partition||Determines how arrays are partitioned, including global arrays and if the partitioning impacts array ports.|
|Config Compile||Controls synthesis specific optimizations such as the automatic loop pipelining and floating point math optimizations.|
|Config Dataflow||Specifies the default memory channel and FIFO depth in dataflow optimization.|
|Config Interface||Controls I/O ports not associated with the top-level function arguments and allows unused ports to be eliminated from the final RTL.|
|Config Op||Configures the default latency and implementation of specified operations.|
|Config RTL||Provides control over the output RTL including file and module naming, and reset controls.|
|Config Schedule||Determines the effort level to use during the synthesis scheduling phase and the verbosity of the output messages|
|Config Storage||Configures the default latency and implementation of specified storage types.|
|Config Unroll||Configures the default tripcount threshold for unrolling loops.|