You might be curious about the need to implement the packet switching scheme 1:4/4:1. This was done to circumvent an AI Engine architecture limitation on the number of simultaneous input and output AXI-Streams allowed per AI Engine column. There are 50 AI Engine columns in the AI Engine array. Each column contains 8 AI Engine tiles. Each AI Engine column is allowed a maximum of 6 32-bit AXI-Stream inputs and 4 32-bit AXI-Stream outputs.
In the design, each
nbody() kernel is mapped to an AI Engine tile. Meaning each column of 8 AI Engine tiles has 9 inputs streams and 8 output streams, violating these constraints.
With the 1:4/4:1 packet switching scheme, you can combine 4 streams into 1. Because packet switching is applied on the
w_input_i ports, the number of input streams into a single AI Engine column is reduced to three:
input_istream that goes to tiles 0-3 in a column
input_istream that goes to tiles 4-7 in a column
input_jstream that is broadcasted to all the columns
On the output side, the number of output streams is reduced to two:
output_istream coming from tiles 0-3 in a column
output_istream coming from tiles 4-7 in a column