Automatic Port Width Resizing

Automatic Port Width Resizing - 2023.2 English

Vitis High-Level Synthesis User Guide (UG1399)

Document ID

UG1399

Release Date

2023-12-18

Version

2023.2 English

In the Vitis tool flow Vitis HLS provides the ability to automatically re-size m_axi interface ports to 512-bits to improve burst access. However, automatic port width resizing only supports standard C data types and power of two size struct, where the pointer is aligned to the expected widened byte size. If the tool cannot automatically widen the port, you can manually change the port width by using Vector or Arbitrary Precision (AP) as the data type of the port.

Important: Structs on the interface prevent automatic widening of the port. You must break the struct into individual elements to enable this feature.

Vitis HLS controls automatic port width resizing using the following two commands:

syn.interface.m_axi_max_widen_bitwidth=<N>: Directs the tool to automatically widen bursts on M-AXI interfaces up to the specified bitwidth. The value of <N> must be a power-of-two between 0 and 1024.
syn.interface.m_axi_alignment_byte_size=<N>: Note that burst widening also requires strong alignment properties. Assume pointers that are mapped to m_axi interfaces are at least aligned to the provided width in bytes (power of two). This can help automatic burst widening.

In the Vitis Kernel flow automatic port width resizing is enabled by default with the following:

syn.interface.m_axi_max_widen_bitwidth=512
syn.interface.m_axi_alignment_byte_size=64

In the Vivado IP flow this feature is disabled by default:

syn.interface.m_axi_max_widen_bitwidth=0
syn.interface.m_axi_alignment_byte_size=1

Automatic port width resizing will only re-size the port if a burst access can be seen by the tool. Therefore all the preconditions needed for bursting, as described in AXI Burst Transfers, are also needed for port resizing. These conditions include:

Must be a monotonically increasing order of access (both in terms of the memory location being accessed as well as in time). You cannot access a memory location that is in between two previously accessed memory locations- aka no overlap.
The access pattern from the global memory should be in sequential order, and with the following additional requirements:
- The sequential accesses need to be on a primitive type or non vector power of two size aggregate type
- The start of the sequential accesses needs to be aligned to the widen word size
- The length of the sequential accesses needs to be divisible by the widen factor

The following code example is used in the calculations that follow:

vadd_pipeline:
  for (int i = 0; i < iterations; i++) {
#pragma HLS LOOP_TRIPCOUNT min = c_len/c_n max = c_len/c_n

  // Pipelining loops that access only one variable is the ideal way to
  // increase the global memory bandwidth.
  read_a:
    for (int x = 0; x < N; ++x) {
#pragma HLS LOOP_TRIPCOUNT min = c_n max = c_n
#pragma HLS PIPELINE II = 1
      result[x] = a[i * N + x];
    }

  read_b:
    for (int x = 0; x < N; ++x) {
#pragma HLS LOOP_TRIPCOUNT min = c_n max = c_n
#pragma HLS PIPELINE II = 1
      result[x] += b[i * N + x];
    }

  write_c:
    for (int x = 0; x < N; ++x) {
#pragma HLS LOOP_TRIPCOUNT min = c_n max = c_n
#pragma HLS PIPELINE II = 1
      c[i * N + x] = result[x];
    }
  }
}
}

The width of the automatic optimization for the code above is performed in three steps:

The tool checks for the number of access patterns in the read_a loop. There is one access during one loop iteration, so the optimization determines the interface bit-width as 32= 32 *1 (bitwidth of the int variable * accesses).
The tool tries to reach the default max specified by the config_interface -m_axi_max_widen_bitwidth 512, using the following expression terms:
```
length = (ceil((loop-bound of index inner loops) * 
(loop-bound of index - outer loops)) * #(of access-patterns))
```
- In the above code, the tool supports imperfect loop nest. If a and b are bundled to the same port, the tool will not extend the bursts on a and b due to conflicting. If a and b are bundled to different ports, the tool will extend the bursts on a and b to the outer loop. Therefore the formula will be shortened to:
```
length = (ceil((loop-bound of index inner loops)) * #(of access-patterns))
```
  or: length = ceil(128) *32 = 4096
Is the calculated length a power of 2? If Yes, then the length will be capped to the width specified by syn.interface.m_axi_max_widen_bitwidth.

There are some pros and cons to using the automatic port width resizing which you should consider when using this feature. This feature improves the access throughput, instead of the data type size. It also adds more resources as it needs to buffer the huge vector and might need to shift the data accordingly to the data path size.