Improving Performance in the AI Engine - 2020.2 English

Versal ACAP System Integration and Validation Methodology Guide (UG1388)

Document ID
UG1388
Release Date
2021-02-04
Version
2020.2 English

You can use the Xilinx Runtime (XRT) APIs to measure performance metrics like platform I/O port bandwidth, graph throughput, and graph latency. Use these APIs in the host application code with the AI Engine graph object. This object is used to initialize, run, update and exit graphs. In addition, you can use these APIs to profile graph objects to measure bandwidth, throughput, and latency. For more information, see "Run Time Event API for Performance Profiling" in the Versal ACAP AI Engine Programming Environment User Guide (UG1076).

AI Engine performance analysis typically involves system performance issues such as missing or mismatching locks, buffer overruns, and incorrect programming of direct memory access (DMA) buffers. It also includes memory/core stalls, deadlocks, and hot spot analysis. The AI Engine architecture has direct support for generation, collection, and streaming of events as trace data during simulation, hardware emulation, or hardware execution. This data can then be analyzed for functional issues and latency problems between kernels, memory stalls, deadlocks, etc. For more information, see "Performance Analysis and Event Tracing" in the Versal ACAP AI Engine Programming Environment User Guide (UG1076).