With the input data rate being 2 Gsps on each AI Engine and the filter with 32 taps, the data stream and coefficient can be split into eight phases as each AI Engine is capable of 4-tap filter processing. This leads to 2 Gsps x 8 Phases = 16 Gsps input sample rate. You will now design the maximum performance for the filter.
The same recommendations as in the previous section will apply:
Each AI Engine in a column should receive the same data.
One row in every two has a cascade stream in the other direction, leading to a differentiated set of stream for even and odd rows.
The kernels above the diagonal (lower left to upper right) should discard one element in the stream.