There are two types of interfaces: windows and streams. The bandwidth of the memory (where windows are stored) access is much higher than the streams: 2x40 GBps vs. 2x5 GBps (@1.25 GHz). Even if the memory bandwidth from the processor is high, they must be filled in either by another AI Engine (bandwidth 40 GBps) or streams (2x5GBps). Either way, somewhere in the cascade of kernels. the origin of the data will be outside of the AI Engine array (PL, DDR, …) implying a stream source.
Window interfaces are used in a ‘ping-pong’ manner to allow for continuous data transfer while maintaining continuous processing. When multiple kernels are mapped to the same AI Engine and they communicate through windows, these windows use a single buffer because the kernels do not run at the same time. Ping-pong buffering means that the data is processed only when the buffer is completely filled in, incurring a minimum latency of the duration of this buffer filling. When an AI Engine kernel uses window interfaces, it must acquire a lock to gain access ownership to this memory. Lock acquisition and release takes a minimum of seven cycles per lock, which reduces the time allowed for processing.
As a rule of thumb, 900 Msps (@1.25GHz) is the maximum sample rate for which window interfaces are a viable solution. When the kernel processing duration is just a fraction of the time it takes to fill in the input window, this is reflected by a utilization ratio of below 1 and multiple kernels can be mapped onto a single AI Engine.
In this tutorial, the goal is to achieve the maximum performance filter implementation, leading to a streaming interface at the input and the output.