You might be curious about the need to implement the packet switching scheme 1:4/4:1. This was done to circumvent an AI Engine architecture limitation on the number of simultaneous input and output AXI-Streams allowed per AI Engine column. There are 50 AI Engine columns in the AI Engine array. Each column contains 8 AI Engine tiles. Each AI Engine column is allowed a maximum of 6 32-bit AXI-Stream inputs and 4 32-bit AXI-Stream outputs.
In the design, each nbody()
kernel is mapped to an AI Engine tile. Meaning each column of 8 AI Engine tiles has 9 inputs streams and 8 output streams, violating these constraints.
8
w_input_i
input streams1
w_intput_j
input stream8
w_output_i
output streams
With the 1:4/4:1 packet switching scheme, you can combine 4 streams into 1. Because packet switching is applied on the w_input_i
ports, the number of input streams into a single AI Engine column is reduced to three:
1
input_i
stream that goes to tiles 0-3 in a column1
input_i
stream that goes to tiles 4-7 in a column1
input_j
stream that is broadcasted to all the columns
On the output side, the number of output streams is reduced to two:
1
output_i
stream coming from tiles 0-3 in a column1
output_i
stream coming from tiles 4-7 in a column