Raw Throughput Evaluation - 1.0 English

AXI High Bandwidth Memory Controller LogiCORE IP Product Guide (PG276)

Document ID
PG276
Release Date
2022-11-02
Version
1.0 English

It is important to understand the raw throughput of the HBM stacks and how the user logic must be designed to match this data rate. Each HBM stack has eight channels, each channel has a dedicated memory controller, each memory controller is operating in pseudo channel mode, each pseudo channel is 64 bits wide, and the data bits toggle at twice the HBM clock rate set in the IP configuration. If the HBM IP is configured with a 900 MHz clock rate, the toggle rate for a single HBM stack can be evaluated using the following formula:


Page-1 Sheet.1 (64 bits per pseudo channel) x (2 pseudo channels per memory ... (64 bits per pseudo channel) x (2 pseudo channels per memory controller) x (8 channels) x 900 MHz x 2

This results in 1,843,200 Mb per second per stack, or 230,400 MB per second per stack. Double this value to 460,800 MB per second for a dual stack device.

From the user logic perspective each AXI port for the HBM is 256 bits wide and there is one AXI port per pseudo channel in the stack. From the example above, the user logic must clock the AXI ports at 450 MHz to match the toggle rate of the HBM when it is running at 900 MHz. This is evaluated by the following formula:

Page-1 Sheet.1 (256 bits per AXI port) x (2 ports per memory controller) x (... (256 bits per AXI port) x (2 ports per memory controller) x (8 channels) x 450 MHz

This results in 1,843,200 Mb per second per stack, or 230,400 MB per second per stack. Double this value for 460,800 MB per second for a dual stack device with 32 AXI ports.

These are the raw HBM throughput calculations but as with all traditional volatile memories the arrays in the HBM stacks need to be refreshed to maintain data integrity. The base refresh interval (tREFI) for the HBM stacks is 3.9 μs. For 4H devices, the refresh command period (tRFC) is 260 ns and for 8H devices it is 350 ns. Adding in the refresh overhead to the raw throughput of a 4H device causes the loss of approximately 7% of peak efficiency, and 8H devices lose approximately 9%. As is the case with traditional volatile memory, tREFI decreases as the temperature increases, so more available HBM interface time is lost to refresh overhead as the stacks heat up. The base rate of 3.9 μs is used for temperatures from 0°C to 85°C. Between 85°C and 95°C tREFI is reduced to 1.95 μs.