Window-Based Access - 2022.2 English

AI Engine Kernel and Graph Programming Guide (UG1079)

Document ID

UG1079

Release Date

2022-10-19

Version

2022.2 English

Windows are a block or frame of data stored in local memory, on which an AI Engine kernel can operate. This data can reside either in the local memory of the current tile or in the local memory of an adjacent neighboring tile. The origin of these blocks can be kernels that reside either on the same tile or on other tiles that produce the block of data. This window of data can also come from the PL or the PS through the AI Engine array interface. When a kernel has a window on its input side, it waits for the window of data to be fully available before it starts execution. The kernel can access the content of the window either randomly or in a linear fashion. Conversely, the kernel can write a block of data to local memory that can be used by other kernels after it has finished execution.

When the source of a window is a stream, this stream is sliced into contiguous blocks which are stored one by one into windows as illustrated.

Figure 1. Data stream slicing into windows

The view that a kernel has of incoming blocks of data is called an input window. Input windows are defined by a type. The type of data contained within that window needs to be declared before the kernel can operate on it. This example shows a declaration of an input window carrying complex integers where both the real and the imaginary parts are 16-bits wide.

input_window<cint16> myFirstWindow;

The view that a kernel has of outgoing blocks of data is called an output window. These are defined by a type. This example shows a declaration of an output window carrying 32-bit integers.

output_window<int32> myOtherWindow;

These window data structures are automatically inferred by the AI Engine compiler from the data flow graph connections and are automatically declared in the wrapper code implementing the graph control. The kernel functions merely operate on pointers to the window data structures that are passed to them as arguments. There is no need to declare these window data structures in the data flow graph or kernel program.

When two kernels (k1, k2) communicate through windows (the output window of k1 is connected to an input window of k2) the compiler attempts to place them into tiles that can share at least an AI Engine memory module.

If the two kernels are located on the same tile, the compiler uses a single memory area to communicate as they are not executed simultaneously. Since the execution of multiple kernels within an AI Engine is sequential, access conflicts, when using the same memory area, are avoided.
Figure 2. Same Tile and Memory Sharing Placement Example
If the two kernels are placed in different tiles sharing an AI Engine memory module, the compiler will infer a ping-pong window, allowing the two kernels to write and read at the same time but not to the same memory area.
If the two kernels are placed in distant tiles, the compiler will automatically infer a ping-pong window at the output of k1, and another one at the input k2 . The two ping-pongs are connected with a DMA which will automatically copy the content of the output window of k1 onto the input window of k2 using a data stream.

Figure 3. Distant Tiles Placement Example

When multiple windows/streams converge onto a kernel, the various paths may have very different latencies, which can potentially lead to a deadlock. To avoid this kind of problem, you can insert a FIFO between the two kernels. The compiler will generate the same type of architecture as the distant tile case, except that a FIFO will be inserted in the middle of the stream connection.