Some algorithms are memory bound, limited by the 77GB/s bandwidth available on DDR-based Alveo cards. For those applications there are HBM (High Bandwidth Memory) based Alveo cards, providing up to 460 GB/s memory bandwidth. For the Alveo implementation, 2 16-layer HBM (HBM2 specification) stacks are incorporated into the FPGA package and connected into the FPGA fabric with an interposer. A high-level diagram of the two HBM stacks is as follows.
This implementation provides:
- 8GB HBM memory
- 32 256MB HBM segments, called pseudo channels (PCs)
- An independent AXI channel for communication with the FPGA through a segmented crossbar switch per pseudo channel
- A two-channel memory controller per two PCs
- 14.375 GB/s max theoretical bandwidth per PC
- 460 GB/S ( 32 *14.375 GB/s) max theoretical bandwidth for the HBM subsystem
Although each PC has a theoretical max performance of 14.375 GB/s, this is less than the theoretical max of 19.25 GB/s for a DDR channel. To get better than DDR performance, designs must efficiently use multiple AXI masters into the HBM subsystem. The programmable logic has 32 HBM AXI interfaces that can access any memory location in any of the PCs on either of the HBM stacks through a built-in switch providing access to the full 8 GB memory space. For more detailed information on the HBM, refer to AXI High Bandwidth Controller LogiCORE IP Product Guide (PG276).
Connection to the HBM is managed by
the HBM Memory Subsystem (HMSS) IP, which enables
all HBM PCs, and automatically connects the XDMA to
the HBM for host access to global memory. When used
with the Vitis compiler, the HMSS is automatically
customized to activate only the necessary memory controllers and ports as specified by
--connectivity.sp option to connect both the user
kernels and the XDMA to those memory controllers for optimal bandwidth and latency.
Refer to the Using HBM Tutorial for additional
information and examples.
In the following config file example, the kernel input ports
in2 are connected
to HBM PCs 0 and 1 respectively, and writes output
out to HBM
PCs 3–4. Each HBM PC is 256 MB, giving a total of 1 GB of memory access for this
[connectivity] sp=krnl.in1:HBM sp=krnl.in2:HBM sp=krnl.out:HBM[3:4]
The HBM ports are located in the bottom SLR of the device. The HMSS
automatically handles the placement and timing complexities of AXI interfaces crossing
super logic regions (SLR) in SSI technology devices. By default, without specifying the
--connectivity.slr options on
kernel AXI interfaces access HBM and all kernels are assigned to SLR0. However, you
can specify the SLR assignments of kernels using the
--connectivity.slr option. Refer to Assigning Compute Units to SLRs for more information.