Some algorithms are memory bound, limited by the 77 GB/s bandwidth available on DDR-based Alveo cards. For those applications there are High Bandwidth Memory (HBM) based Alveo cards, providing up to 460 GB/s memory bandwidth. For the Alveo implementation, two 16-layer HBM (HBM2 specification) stacks are incorporated into the FPGA package and connected into the FPGA fabric with an interposer. A high-level diagram of the two HBM stacks is as follows.
This implementation provides:
- 16 GB HBM
- 256 MB for Alveo U50 HBM segments, called pseudo channels (PCs)
- 512 MB for Alveo U55C PCs
- An independent AXI channel for communication with the FPGA through a segmented crossbar switch per pseudo channel
- A two-channel memory controller per two PCs
- 14.375 GB/s max theoretical bandwidth per PC
- 460 GB/s ( 32 * 14.375 GB/s) max theoretical bandwidth for the HBM subsystem
Although each PC has a theoretical maximum performance of 14.375 GB/s, this is less than the theoretical maximum of 19.25 GB/s for a DDR channel. To get better than DDR performance, designs must efficiently use multiple AXI masters into the HBM subsystem. The programmable logic has 32 HBM AXI interfaces that can access any memory location in any of the PCs on either of the HBM stacks through a built-in switch providing access to the full 8 GB for Alveo U50 and 16 GB for Alveo U55C memory space. For more detailed information on the Alveo U50 and U55C, refer to Alveo U50 Data Center Accelerator Cards Data Sheet (DS965) and Alveo U55C Data Center Accelerator Cards Data Sheet (DS978), respectively. For more detailed information on the HBM, refer to AXI High Bandwidth Controller LogiCORE IP Product Guide (PG276).
Connection to the HBM is managed by the
HBM Memory Subsystem (HMSS) IP, which enables
all HBM PCs, and automatically connects the XDMA to
the HBM for host access to global memory. When used
with the Vitis compiler, the HMSS is automatically
customized to activate only the necessary memory controllers and ports as specified by
--connectivity.sp option to connect both the user
kernels and the XDMA to those memory controllers for optimal bandwidth and latency.
Refer to the Using HBM Tutorial for additional
information and examples.
In the following config file example, the kernel input ports
in2 are connected
to HBM PCs 0 and 1 respectively, and writes output
out to HBM
PCs 3–4. Each HBM PC is 256 MB, giving a total of 1 GB of memory
access for this kernel.
[connectivity] sp=krnl.in1:HBM sp=krnl.in2:HBM sp=krnl.out:HBM[3:4]
The HBM ports are located in the bottom SLR of the device.
The HMSS automatically handles the placement and timing complexities of AXI interfaces
crossing super logic regions (SLR) in SSI technology devices. By default, without
--connectivity.slr options on
v++, all kernel AXI interfaces access HBM and all
kernels are assigned to SLR0. However, you can specify the SLR assignments of kernels
--connectivity.slr option. Refer to Assigning Compute Units to SLRs for more information.