From the AI Engine core profiling data, tile(25,0) has a much larger number of Store Instructions. An indication check, tile source code if lowering number of Store Instructions can be done to improve performance.
From the AI Engine Memory profiling data, there are no Memory Conflict Time values, i.e., there are no memory violations in the source code. If there are any, it is suggested to run the AIE simulator, check for memory access violations, and clear those violations.
From the AI Engine Memory profiling data, tile(24,0) has a longer Cumulative DMA Lock Stalls Time. This leads to check the input/output PLIO area to see if the PLIO frequency or PLIO width is implemented properly. AMD suggests using the integrated logic analyzer (ILA) to check the PLIO input/output states during runtime.