Packetization Overhead - 1.0 English

Versal ACAP Programmable Network on Chip and Integrated Memory Controller LogiCORE IP Product Guide (PG313)

Document ID
PG313
Release Date
2022-12-14
Version
1.0 English

The amount of data that can be moved through the NoC data paths is affected by packetization overhead. Some flits transport data while others carry protocol related information such as the transaction address or the packet response.

The following figure shows the packet types and the overhead incurred for each type. The NoC packet domain data paths are all 128 bits (16 bytes) wide. Each read request consumes one header flit in the request path and n data flits in the response path. This is shown in (a). A write request has one header flit and n data flits. For example, a 64-byte transfer that is aligned to the burst start consumes five data path flits (one header flit and four data flits). The write response consumes one flit in the response path. This is shown in (b).

Figure 1. Packet Flit Overheads

If the AXI request is less than the chop size (minimum of 256 bytes or interleave granularity in the case of interleaved memory channels), the transfer ends early and the unused flits are available for another command.

A data path with bandwidth X GBytes/sec of raw bandwidth will have (n/n+1) X Gbytes/sec of bandwidth where n is the burst length. Burst length is set in the QoS tab of the NoC GUI under advanced features. To minimize overhead, select a burst length such that (burst length X data width in Bytes = chop size).

Each write request generates one response flit. Note that the NoC NMU has a default chop size of 256 bytes, or interleave granularity if smaller. A write request can get chopped into multiple write bursts and each one will have a write response. The NMU handles coalescing the responses so the original AXI write bursts only see one write response.

For a full duplex NoC link sending a mix of read and write requests in one direction and receiving responses, extra flits will be generated which will degrade the peak bandwidth.

The following figure shows an example. The request and response paths of one physical link are shown, with the order of flits sent over each physical link in each direction. This example shows 64B packets with a 50/50 read/write mix. For this traffic mix and packet size there will be groups of six flits in the request path. These six flits are one read request flit, one write request header flit, and four write data flits. In the response path there will be four read data response flits and one write response flit. There is also one dead cycle, because this traffic mix involves a 50/50 mix between read and write data (that is, the read and write bandwidth are the same. With this mix there will be five response flits for every six request flits and the request path is the bottleneck (the response path has some spare capacity).

Figure 2. Link Utilization for Mixed Read/Write Traffic
Note: The NoC compiler does not necessarily share the same physical link when routing read and write traffic from one master. The following examples illustrate bandwidth achievable when read and write traffic is mixed on the same physical link. The NoC compiler factors flit overheads when calculating available bandwidth. This section is provided to help understand possible bottlenecks when contention occurs.
The following table provides the expected peak bandwidths for a given read/write traffic mix. These examples are for a 1 GHz NoC clock frequency and the numbers scale linearly with the NoC clock.
Table 1. Read/Write Bandwidth for Mixed Traffic at a 1 GHz NoC Clock Frequency
Bytes per Transaction/Frequency 0/100

(%Rd/%Wr)

30/70 50/50 70/30 100/0
32B@1000 MHz 0/10.67 4.0/9.33 8.0/8.0 13.17/5.65 16.0/0
64B@1000 MHz 0/12.8 5.05/11.79 10.67/10.67 14.45/6.19 16.0/0
256B@1000 MHz 0/15.06 6.30/14.69 14.22/14.22 15.58/6.68 16.0/0