HBM Address Map and Protocol Considerations - 1.0 English

AXI High Bandwidth Memory Controller LogiCORE IP Product Guide (PG276)

Document ID
PG276
Release Date
2022-11-02
Version
1.0 English

To design the most efficient system architecture and user logic to access the HBM, it is important to understand the physical address space of the HBM as well as the address map option set in the IP configuration in the Vivado IDE. Understanding of these two aspects is required to evaluate the HBM protocol execution. This is the largest factor in evaluating a system performance based on the user logic AXI access pattern. The following table defines the HBM physical address map for 4H and 8H devices.

Table 1. Physical Address Map for 4H and 8H Devices
HBM Arrangement 4H Device (4 GB per Stack) 8H Device (8 GB per Stack)
Density per Channel 4 Gb 8 Gb
Density per Pseudo Channel 2 Gb 4 Gb
Row Address RA[13:0] RA[13:0]
Column Address CA[5:1] CA[5:1]
Bank Group Address BA[3:0] SID, BA[3:0]
Bank Arrangement

16 Banks

4 Bank Groups with 4 Banks

32 Banks

8 Bank Groups with 4 Banks

Total User Address Bits 23 24

The total address space of a 4H device is 32 bits and for an 8H device it is 33 bits. The following table describes the AXI addressing for these devices.

Table 2. AXI Addressing for 4H and 8H Devices
HBM Arrangement 4H Device (4 GB per Stack) 8H Device (8 GB per Stack)
Total Address Bits 33 total as 32:0 34 total as 33:0
Stack Select: 0 = Left 1 = Right 32 33
Destination AXI Port: 0 – 15 31:28 32:29
HBM Address Bits 27:5 28:5
Unused Address Bits 4:0 4:0

HBM operation closely follows that of traditional volatile memories and is specifically similar to DDR4. The basics of protocol operation dictate the resulting efficiency when accessing the memory array, and this must be a significant consideration along with the user AXI access pattern and how the user logic is driving the AXI channels during operation.

Like DDR4, HBM uses the concept of Banks and Bank Groups for the memory and leveraging these concepts is how to achieve a highly efficient array access pattern. 4H devices have a total of 16 Banks, arranged as 4 Bank Groups each with 4 Banks. 8H devices have 32 Banks, arranged as 8 Bank Groups with 4 Banks.

The HBM supports one active Row address per Bank. Protocol access times between Banks in different Bank Groups are lower than when accessing Banks within the same Bank Group, and the currently active Row within a Bank must be Precharged before a different Row within that Bank can be activated. When a Row is activated within a Bank, it is recommended to perform multiple Column accesses within that Row before changing the Row. Doing this is considered to result in a high page hit rate, which means higher efficiency.

By default the HBM IP is set to a Row Bank Column addressing map with the Bank Group Interleave option enabled. With these default settings, the highest order address bits are the Row address (RAx) bits, of which only one Row address per Bank can be active at a time. The middle address bits are the Bank address bits, which are displayed as BGx for Bank Groups and BAx for Bank addresses. The next lowest address range is the Column address bits which are displayed as CAx, and these are accessed by Write and Read commands.

The Bank Group Interleave option means that BG0, the least significant bit of the Bank Group addressing, is placed as the least significant user address bit of the HBM memory map (addr[5]). With the default address map, an AXI transaction with an AxLEN of 0x1 and AxADDR of 0x0 executes two discrete commands on the HBM interface. The first goes to Row 0, Bank Group 0, Bank address 0, Column 0. The second goes to Row 0, Bank Group 1, Bank address 0, and Column 0.

Having the Bank Group Interleave option with BG0 as the least significant bit is in service of sequential memory accesses. It decreases the amount of time spent waiting for protocol execution because the controller splits the accesses between two Banks in two separate Bank Groups. An AXI transaction with an AxLEN of 0x1 demonstrates this behavior, but fully leveraging these concepts for higher efficiency requires more consideration with longer transaction lengths or leveraging traffic streams mapped to Bank addresses across Bank Groups.

The default address map option is ideal for short mixed traffic because the Bank Group Interleave option alleviates some protocol exposure when the AxLEN is 0x1 or larger. This default address map option also supports the AXI reordering core which can help efficiency but might increase latency.

Important: For 8H HBM devices the additional Bank bit is mapped to the SID bit, which is placed at the most significant address bit within the default Row Bank Column address map.

For 8H HBM devices the additional Bank bit is mapped to the SID bit which is placed at the most significant address bit with the default Row Bank Column address map. This is due to a limitation with the AXI Reordering core. Take notice with the default Row Bank Column address map and 8H devices to manage their traffic master addressing to use the SID bit as the most significant Bank Group bit. When a custom address map is used the SID bit can be placed as desired but the AXI Reordering Core is disabled.

When the Custom Address Map option is enabled in the HBM configuration in the Vivado IDE, more addressing options are available but the AXI Reordering Core is disabled. The Custom Address Map option makes it possible to manually assign address bits to other locations. The following sections discuss the concepts behind the Row Column Bank and Bank Row Column presets. While the presets might not be an ideal match for every application, it is still possible to manipulate any address bit to make it better suited for the use case. When the Custom Address Map option is enabled, the Bank Group Interleave setting goes to False, but once again it is possible to remap the BG0 bit to the least significant bit position of the address space to achieve the same result.

Note: When the Custom Address Map option is enabled, many of the reordering options are no longer available. Efficiency and low latency must be achieved by having well-defined traffic masters and access patterns for the physical address map.

The Row Column Bank address map option is ideal for long sequential access patterns. This is because for long transaction lengths, for instance AxLEN of 0x8 or longer, the majority of the protocol exposure is hidden by Bank Group switching. This address map is best used when the AXI transactions are only going in one direction at a time. If the traffic must change direction, those accesses should target the same Row/Bank combinations to guarantee a page hit for the Column access. This means when a long sequence of Writes is being serviced, the user logic should not issue any Read requests because this causes a bus turnaround, resulting in idle periods on the HBM interface and lowering efficiency. If the Write and Read accesses do not target the same Row/Bank combinations, the user logic should never try to switch directions until the current access sequence is complete. In this scenario, if a Write stream is executing and a Read stream is trying to access a different Row/Bank combination, the controller issues multiple Precharge and Activate commands because the new traffic stream consists only of page misses. This causes a significant amount of idle time on the HBM interface and lowers efficiency.

The Bank Row Column address map option is ideal for designs where the user logic has segmented traffic streams into separate Bank Group and Bank address ranges. This means multiple traffic streams can operate independently in their own Bank Group/address combinations and use the remaining address space as required. In a simplified example scenario, there are two streams where one is mapped to BA0 and BA1 across all Bank Groups while another is mapped to BA2 and BA3 across all Bank Groups. The first stream is long and sequential and the second stream is completely random and short. The first stream maintains high efficiency because it has a high page hit rate. The second stream with random addressing has low efficiency, but it never targets the same Bank/Row combination as the first stream, so high efficiency is maintained for the first stream. Without these considerations, if a traffic stream is long and sequential while another is random, the random stream interferes with the sequential stream and both have low efficiency.