Data Communication via AXI4-Stream Interconnect - 2021.2 English

AI Engine Kernel Coding Best Practices Guide (UG1079)

Document ID
Release Date
2021.2 English

AI Engines can directly communicate through the AXI4-Stream interconnect without any DMA and memory interaction. Data can be sent from one AI Engine to another or broadcast through the streaming interface. The data bandwidth of a streaming connection is 32-bit per cycle and built-in handshake and backpressure mechanisms are available.

For streaming input and output interfaces, when the performance is limited by the stream number, the AI Engine is able to use two streaming inputs or two streaming outputs in parallel, instead of one streaming input or output. To use two parallel streams, it is recommended to use the following pairs of macros, where idx1 and idx2 are the two streams. Add the __restrict keyword to stream ports to ensure they are optimized for parallel processing.

READINCR(SS_rsrc1, idx1) and READINCR(SS_rsrc2, idx2) 
READINCRW(WSS_rsrc1, idx1) and READINCRW(WSS_rsrc2, idx2) 
WRITEINCR(MS_rsrc1, idx1, val) and WRITEINCR(MS_rsrc2, idx2, val) 
WRITEINCRW(WMS_rsrc1, idx1, val) and WRITEINCRW(WMS_rsrc2, idx2, val)

Following is a sample code to use two parallel input streams to achieve pipelining with interval 1. Interval 1 means that two read, one write, and one add are in every cycle.

void simple(   input_stream_int32 * __restrict data0,  
			input_stream_int32 * __restrict data1,   			
			output_stream_int32 * __restrict out) { 
	for(int i=0; i<1024; i++) 
		int32_t d = READINCR(SS_rsrc1, data0) ; 
		int32_t e = READINCR(SS_rsrc2, data1) ; 

The stream connection can be unicast or multicast. Note that in the case of multicast communication, the data is sent to all the destination ports at the same time and only when all destinations are ready to receive data.