To achieve the highest performance on the AI Engine, the primary goal of single kernel programming is to ensure that the use of the vector processor approaches its theoretical maximum. Vectorization of the algorithm is important, but managing the vector registers, memory access, and software pipelining are also required. Because the vector processor is capable of an operation every clock cycle, the programmer must strive to make the data for the next operation load during the current operation. When implementing an algorithm for the AI Engine, it is important to start vectorization based on the data types and the vector intrinsic functions that operate on those data types. Depending on the data type, the various intrinsic functions operate on two or more elements at the same time. When the inner loop has sequential or loop carried dependencies it might be possible to unroll an outer loop and compute multiple values in parallel. There are many creative ways to use the vector intrinsic functions to solve problems. When implementing an algorithm for a Versal® ACAP, it is important to understand what the AI Engine does well and what would be better implemented in the other engines, for example, the Scalar, Adaptable and DSP engines.
To support AI Engine single kernel development, the Vitis IDE supports AI Engine kernel development in addition to traditional processor support. The Vitis IDE provides a single node graph example that can be used as a starting point for single kernel development. The Vitis IDE has a debug view which displays registers, variables, available breakpoints, variables to register/memory mapping, internal/external memory contents, and an instruction pipeline (pipeline view) for each individual kernel.
--profileoption enabled after the emulation run.