PLRAM Configuration and Use - 2021.1 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2022-03-29
Version
2021.1 English

Alveo accelerator cards contain HBM DRAM and DDR DRAM memory resources. In some accelerator cards, an additional memory resource available is internal FPGA PLRAM (UltraRAM and block RAM). Supporting platforms typically contain instances of PLRAM in each SLR. The size and type of each PLRAM can be configured on the target platform before kernels or Compute Units are linked into the system.

You can use a Tcl script to configure the PLRAM before system linking occurs. The use of the Tcl script can be enabled on the v++ command line as follows:
v++ -l --advanced.param compiler.userPreSysLinkOverlayTcl=<path_to>/user_tcl_file.tcl
Within this user-specified Tcl script, an API is provided to let you configure the PLRAM instance or memory resource:
sdx_memory_subsystem::update_plram_specification <memory_subsystem_bdcell> <plram_resource> <plram_specification>

The <plram_specification> is a Tcl dictionary consisting of the following entries (entries below are the default values for each instance in the platform):

 { 
	SIZE 128K # Up to 4M 
	AXI_DATA_WIDTH 512 # Up to 512
	SLR_ASSIGNMENT SLR0 # SLR0 / SLR1 / SLR2 
	READ_LATENCY 1 # To optimise timing path 
	MEMORY_PRIMITIVE BRAM # BRAM or URAM 
}

In the example below, PLRAM_MEM00 is changed to be 2 MB in size and composed of UltraRAM; PLRAM_MEM01 is changed to be 4 MB in size and composed of UltraRAM. PLRAM_MEM00 and PLRAM_MEM01 correspond to the --conectivity.sp memory resources PLRAM[0] and PLRAM[1].

# Setup PLRAM 
sdx_memory_subsystem::update_plram_specification 
[get_bd_cells /memory_subsystem] PLRAM_MEM00 { SIZE 2M AXI_DATA_WIDTH 512 
SLR_ASSIGNMENT SLR0 READ_LATENCY 10 MEMORY_PRIMITIVE URAM} 

sdx_memory_subsystem::update_plram_specification 
[get_bd_cells /memory_subsystem] PLRAM_MEM01 { SIZE 4M AXI_DATA_WIDTH 512 
SLR_ASSIGNMENT SLR0 READ_LATENCY 10 MEMORY_PRIMITIVE URAM} 

validate_bd_design -force
save_bd_design

The READ_LATENCY is an important attribute, because it sets the number of pipeline stages between memories cascaded in depth. This varies by design, and affects the timing QoR of the platform and the eventual kernel clock rate. In the example above for PLRAM_MEM01:

  • 4 MB of memory are required in total.
  • Each UltraRAM is 32 KB (64 bits wide). 4 MB × 32 KB → 128 UltraRAMs in total.
  • Each PLRAM instance is 512 bits wide → 8 UltraRAMs are required in width.
  • 128 total UltraRAMs with 8 UltraRAMs in width → 16 UltraRAMs in depth.
  • A good rule of thumb is to pick a read latency of depth/2 + 2 → in this case, READ_LATENCY = 10.

This allows a pipeline on every second UltraRAM, resulting in the following:

  • Good timing performance between UltraRAMs.
  • Placement flexibility; not all UltraRAMs need to be placed in the same UltraRAM column for cascade.