Buffer ports provide a way for a kernel to operate on a block of data. Buffer ports operate in a single direction (e.g., input or output). The view that a kernel has of incoming blocks of data is called an input buffer. Input buffers are defined by a type. The type of data contained within that buffer needs to be declared before the kernel can operate on it.
The view that a kernel has of outgoing blocks of data is called an output
buffer. These are defined by a type. The example below shows a declaration of a kernel
simple kernel has an input buffer named
in, which contains complex integers where both the real and the imaginary
parts are each 16-bit wide signed integers. The
kernel also has an output buffer named
contains 32-bit wide signed integers.
void simple(input_buffer<cint16> & in, output_buffer<int32> &out);
The example below shows input and output buffer port sizes declaration using the adf::extents template parameter.
void simple(input_buffer<cint16, adf::extents<INPUT_SAMPLE_SIZE>> & in, output_buffer<int32, adf::extents<OUTPUT_SAMPLE_SIZE>> & out);
These buffer data structures are automatically inferred by
aiecompiler from the data flow graph connections and are
automatically declared in the wrapper code implementing the graph control. The kernel
functions merely operate on pointers to the buffer data structures that are passed to
them as arguments. There is no need to declare these buffer data structures in the data
flow graph or kernel program.
When two kernels (k1, k2) communicate through buffers (the output buffer of k1 is connected to the input buffer of k2) the compiler attempts to place them into tiles that can share at least an AI Engine memory module.
- If the two kernels are located on the same tile, the compiler uses a single memory area to communicate because they are not executed simultaneously (see k1 and k2 in tile (8,0) and the single shared memory block in (7,0) in the following figure). Because the execution of multiple kernels within an AI Engine is sequential, access conflicts are avoided when using the same memory area.
- If the two kernels are placed in different tiles sharing an AI Engine memory module, the compiler will infer a
ping-pong buffer, allowing the two kernels to write and read at the same time but
not to the same memory area (see k1 in tile (10,0), k2 in tile (11,0) and the shared
buffer implemented as a ping-pong buffer in (10,0) in the following figure).Figure 1. Same Tile and Memory Sharing Placement Example
- If your system performance can handle it, you can switch this
ping-pong buffer into single buffering by applying the
single_buffer(<port>)constraint to the kernel ports.
- If the two kernels are placed in distant tiles, the compiler will
automatically infer a ping-pong buffer at the output of k1, and another one at the
input k2. The two ping-pongs are connected with a DMA which will automatically copy
the content of the output buffer of k1 onto the input buffer of k2 using a data
stream.Figure 2. Distant Tiles Placement Example
- When multiple buffers/streams converge onto a kernel, the various paths may have very different latencies, which can potentially lead to a deadlock. To avoid this kind of problem, you can insert a FIFO between the two kernels. The compiler will generate the same type of architecture as the distant tile case, except that a FIFO is inserted in the middle of the stream connection.