Analyzing PL Kernel Performance in Simulation - 2021.2 English

Versal ACAP System Integration and Validation Methodology Guide (UG1388)

Document ID
Release Date
2021.2 English


All kernels developed by HLS can be optimized using compiler directives and HLS pragmas. The Vitis HLS compiler generates detailed reports containing Fmax, resource utilization, and performance information. In addition to the summary reports, the schedule viewer provides a visual representation of how the design is built and how the operations are scheduled. You can use this view to help identify suboptimal portions of the synthesized design.

You can supplement these compile-time reports by running the HLS cosimulation flow. When you run this flow, Vitis HLS automatically extracts performance data from the simulation results and reports additional performance information such as minimum, maximum, and average running times for FIFO high watermarks. Xilinx recommends using all of these analysis capabilities before integrating the HLS kernel in the system.

Note: A kernel that does not meet performance in a standalone context will not meet performance in the complete system.

Many factors influence the performance of an HLS kernel, including interface properties, loop-level parallelism, task-level parallelism, and more. In particular, understanding the concepts of initiation interval (II) and dataflow are essential to achieve good results. Initiation interval is measured in clock cycles and indicates how often a particular loop or process restarts. For example, if a loop is successfully synthesized with II=1, then in the resulting RTL a new loop iteration starts every cycle. II is closely related to throughput, a key performance metric. Dataflow is a performance optimization that takes advantage of task-level parallelism. Whenever possible, dataflow allows different sub-processes in the design to run concurrently instead of sequentially. Achieving optimal results with dataflow requires a suitable code structure. For more information about initiation interval, dataflow, and other HLS performance optimizations, see the Vitis High-Level Synthesis User Guide (UG1399).


All RTL kernels you develop must be simulated at the block level, either using custom RTL test benches or using the Xilinx® LogiCORE™ AXI Verification IP (VIP) provided in the Vivado® IP Catalog. For more information, see the AXI Verification IP LogiCORE IP Product Guide (PG267).

Tip: Additional performance counters can be written in RTL to count cycles in the PL and calculate latency and throughput to/from the AI Engines.