Window vs. Stream in Data Communication - 2020.2 English

AI Engine Kernel Coding Best Practices Guide (UG1079)

Document ID
UG1079
Release Date
2021-02-04
Version
2020.2 English

AI Engine kernels in the data flow graph operate on data streams that are infinitely long sequences of typed values. These data streams can be broken into separate blocks called windows and processed by a kernel. Kernels consume input blocks of data and produce output blocks of data. An initialization function can be specified to run before the kernel starts processing input data. The kernel can read scalars or vectors from the memory, however, the valid vector length for each read and write operation must be either 128 or 256 bits. Windows of input data and output buffer are locked for kernels before they are executed. Because the input data window needs to be filled with input data before kernel start, it increases latency compared to stream interface. The kernel can perform random access within a window of data and there is the ability to specify a window margin for algorithms that require some number of bytes from the previous sample.

Kernels can also access the data streams in a sample-by-sample fashion. Streams are used for continuous data and using blocking or non-blocking calls to read and write. Cascade stream only supports blocking access. The AI Engine supports two 32-bit stream input ports and two 32-bit stream output ports. Valid vector length for reading or writing data streams must be either 32 or 128 bits. Packet streams are useful when the number of independent data streams in the program exceeds the number of hardware stream channels or ports available. The AI Engine interconnect to and from the PL is through streams.

The following table summarizes the differences in window and stream connections between kernels.

Table 1. Window vs. Stream Connections
Connection Margin Packet Switching Back Pressure Lock Max throughput by VLIW (per cycle) Multicast as a Source
Window Yes Yes 1 No Yes 2*256-bit load + 1*256-bit store No
Stream No Yes 1 Yes No

2*32-bit read + 1*32-bit write, or

1*32-bit read + 2*32-bit write

Yes
  1. Packet switching is only supported between AI Engine kernels and PL kernels.

Graph code is C++ and available in a separate file from kernel source files. The compiler places the AI Engine kernels into the AI Engine array, taking care of the memory requirements and making all the necessary connections for data flow. Multiple kernels with low core usage can be placed into a single tile.

For a complete overview of graph programming with AI Engine tools, refer to the Versal ACAP AI Engine Programming Environment User Guide (UG1076).