Viewing Profiling Results using Vitis Analyzer - 2021.2 English

Versal ACAP AI Engine Programming Environment User Guide (UG1076)

Document ID
UG1076
Release Date
2021-12-17
Version
2021.2 English

To launch the vitis_analyzer to view the profiling information in the XRT flow, use the following command.

vitis_analyzer xrt.run_summary

To launch the vitis_analyzer to view the profiling information in the XSDB flow, use the following command.

vitis_analyzer aie_trace_profile.run_summary

Example of Core_metrics: heat_map and Memory_metrics: conflicts

The following image shows the design's active time, stall time, cumulative instruction count, and vector_instruction_count as part of heat_map metric and memory conflict time, as well as cumulative memory error time of conflicts metrics for ten tiles of an example design.

Figure 1. Example of Core_metrics: heat_map and Memory_metrics: conflicts

Note: Click on this icon in the upper-right corner to enable/disable charts.

Consider the AI Engine located in (24,2). The stall time (.043 ms) is 20% of the active time (.214 ms). During this active time, it performs 179200 vector instructions, which represents 95% of the active time. This is an excellent performance that indicates a well optimized core.

Example of Core_metrics: stalls and Memory_metrics: dma_locks

The following image shows the design's memory stall time, stream stall time, cascade stall time, and lock stall time as part of stalls_metrics and cumulative DMA activity time, as well as cumulative DMA locks count of dma_locks_metrics for ten tiles of an example design.

Figure 2. Example of Core_metrics: stalls and Memory_metrics: dma_locks

On the core (24,2), the DMA has been active for 70.645 ms (77.8 millions instructions), but has been stalled during 298 times. This does not indicate stalls in 298 instructions, because a stall can last multiple clock cycles.

Example of Core_metrics : execution and Memory_metrics: conflicts

The following image shows the design's cumulative instruction count, vector instruction count, load instruction count, and store instruction count as part of execution_metrics and memory conflict time, as well as cumulative memory error time of conflicts_metrics for ten tiles of an example design.

Figure 3. Example of Core_metrics : execution and Memory_metrics: conflicts

Although they are minor, core (24,2) suffers from some memory conflicts that must be identified. The occurrence being very small might be due to some DMA or some other kernel access interference.

Example of Core_metrics : stream_put_get and Memory_metrics: dma_stalls_s2mm

The following image shows the design's stream read instruction count, cascade read instruction count, and cascade write instruction count as part of stream_put_get_metrics and s2mm channel0 stalls time time, as well as s2mm channel1 stalls time of dma_stalls_s2mm_metrics for ten tiles of an example design.

Figure 4. Example of Core_metrics : stream_put_get and Memory_metrics: dma_stalls_s2mm

The graph shows that the core (25,1) writes to the cascade stream 3% of the time. (24,1) is the reading for the same amount of time from this cascade stream.

Example of Core_metrics : heat_map and Memory_metrics: dma_locks

The following image shows the design's active time, stall time, cumulative instruction count and vector_instruction_count as part of heat_metrics and cumulative DMA activity time, as well as cumulative DMA locks count of dma_lock_metrics for ten tiles of an example design.

Figure 5. Example of Core_metrics : heat_map and Memory_metrics: dma_locks

The cumulative DMA Activity time jointly with the Cumulative DMA Locks count allows you to see if there is any discrepancy between lock acquisition number and the number of data transferred through the DMAs. The relative number of locks count can also be used to interpret the relative number of iterations of each core.