AXI4-Stream Interconnect

Versal ACAP AIE-ML Architecture Manual (AM020)

Document ID
AM020
Release Date
2022-09-28
Revision
1.0 English

Each AIE-ML tile has an AXI4-Stream interconnect (alternatively called a stream switch) that is a fully programmable, 32-bit, AXI4-Stream crossbar, and is statically configured through the memory-mapped AXI4 interconnect. It handles backpressure and is capable of the full bandwidth on the AXI4-Stream. The following figure is a high-level block diagram of the AXI4-Stream switch. The switch has master ports (data flowing from the switch) and slave ports (data flowing to the switch). The building blocks of the AXI4-Stream interconnect are as follows.

  • Port handlers
  • FIFOs
  • Arbiters
  • Stream switch configuration registers

The following lists some of the features of the AXI4-Stream interconnect:

  • AIE-ML features 1-to-1 loopback, where only ports with the same ID are connected to each other
  • There are 25 slave ports and 23 master ports
  • The switch has one FIFO that is 16-deep and 34 bit (32 bit + 1 bit parity + 1 bit TLAST)
Figure 1. AXI4-Stream Switch High-level Block Diagram

In AIE-ML, the ports are divided into external and local ports. External ports are South, West, North and East. Local ports are AIE-ML, DMA, FIFO, and trace. The features of the ports are as follows:

  • External ports are 2-cycle latency and a 4-deep FIFO
  • Local slave ports are 2-cycle latency and a 4-deep FIFO
  • Local master ports have one register slice with 1-cycle latency and a 2-deep FIFO

Therefore, the latency and buffering crossing the switch are (excluding packet switch arbitration overhead):

  • Local slave to local master: 3-cycle latency and 6-deep FIFO
  • Local slave to external master: 4-cycle latency and 8-deep FIFO
  • External slave to local master; 3-cycle latency and 6-deep FIFO
  • External to external: 4-cycle latency and 8-deep FIFO

Each stream port can be configured for either circuit-switched or packet-switched streams (never at the same time) using a packet-switching bit in the configuration register. A circuit-switched stream is a one-to-many streams. This means that it has exactly one source port and an arbitrary number of destination ports. All data entering the stream at the source is streamed to all destinations. A packet-switched stream can share ports (and therefore, physical wires) with other logical streams. Because there is a potential for resource contention with other packet-switched streams, they do not provide deterministic latency. The latency for the word transmitted in a circuit-switched stream is deterministic; if the bandwidth is limited, the built-in backpressure will cause performance degradation.

A packet-switched stream is identified by a 5-bit ID which has to be unique amongst all streams it shares ports with. The stream ID also identifies the destination of the packet. A destination can be an arbitrary number of master ports and packet-switched streams make it possible to realize all combinations of single/multiple master/slave ports in any given stream.

A packet-switched packet has:

Packet header
Routing and control information for the packet
Data
Actual data in the packet
TLAST
Last word in the packet must have TLAST asserted to mark the end of packet

The packet header is shown here:

Table 1. Packet Header
Odd Parity 3'b000 Source Column Source Row 1'b0 Packet Type 7'b0000000 Stream ID
[31] [30:28] [27:21] [20:16] [15] [14:12] [11:5] [4:0]

The following table summarizes the AXI4-Stream tile interconnect bandwidth for the -1L speed grade devices.

Table 2. AIE-ML AXI4-Stream Tile Interconnect Bandwidth
Connection Type Number of Connections Data Width (bits) Clock Domain Bandwidth per Connection (GB/s) Aggregate Bandwidth (GB/s)
To North/From South 6 32 AIE-ML (1 GHz) 4 24
To South/From North 4 32 AIE-ML (1 GHz) 4 16
To West/From East 4 32 AIE-ML (1 GHz) 4 16
To East/From West 4 32 AIE-ML (1 GHz) 4 16