In the case where multiple kernels fit in a single AI Engine, communication between two or more consecutive kernels can be established using a common buffer in the shared memory. In this case, only a single buffer is needed because the kernels execute one after another in a round-robin.
For cases where the kernels are in separate but neighboring AI Engines, the communication can be carried out through the shared memory module that use ping-pong buffers. These buffers are on separate memory banks so access conflicts are avoided. The synchronization is done through locks. The input and output buffers for the AI Engine kernel are ensured to be ready by the locks associated with the buffers. In this type of communication, routing resources are saved and data transferring latency is eliminated because DMA and AXI4-Stream interconnect are not needed.