U50 Gen3x16 XDMA base_5 Platform

Alveo Data Center Accelerator Card Platforms User Guide (UG1120)

Document ID
UG1120
Release Date
2022-08-26
Revision
1.9 English
Platform name
xilinx_u50_gen3x16_xdma_base_5
Supported by
Vitis tools 2022.1
Platform UUID
4465409525b4c06aec6d0b479d3febe8
Interface UUID
16e2362f82d2feab35529da27134b76d
Release Date
April 2022
Created by
2022.1 tools
Supported XRT versions
2022.1, with support planned through 2022
Satellite controller (SC) FW release
Initial release 5.0.27

Updated to 5.2.18 with the April 2022 update

Link speed
Gen3 x16
Target card
A-U50-P00G-PQ-G

For more information, see Alveo U50 Data Center Accelerator Card.

Release Notes
Change log and known issues for the platform and the SC and CMC firmware are available in the Alveo U50 Master Release Notes Answer Record 75163.

The platform implements the device floorplan shown in the following figure and uses resources across the multiple super logic regions (SLR) of the device. The static and dynamic regions are shown across the SLRs, along with the available HBM memory connections associated with SLR0.

Figure 1. Floorplan

To get the same information for development platforms, after you install the Vitis™ unified software platform, use the platforminfo command utility. It reports information on interfaces, clocks, valid SLRs, allocated resources, and memory in a structured format. For more information, see platforminfo Utility in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416).

Memory

The Alveo U50 Data Center accelerator card has 8 GB of high-bandwidth memory (HBM) accessible through 32 pseudo channels. In addition, it is possible to use the device logic resources for small, fast, on-chip memory accesses as PLRAM. The following table lists the allocation of memory resources per SLR.

Note: For details on assigning kernels to HBM memory channels see Mapping Kernel Ports to Memory.
Table 1. Available Memory Resources per SLR
Resources SLR0 SLR1
PLRAM memory channels (system port name) PLRAM[0:1] (128K, block RAM) PLRAM[2:3] (128K, block RAM)
HBM memory channels (system port name) HBM [0:31] (8 GB) No connections

Card Thermal and Electrical Protections

With the xilinx_u50_gen3x16_xdma_5_202210_1 platform, there are protections to ensure production cards operate within electrical and thermal limits while running acceleration kernels. The following table defines the power and thermal thresholds used to trigger each protection. These protections take three forms and are triggered when the respective thresholds are crossed:

  • Clock throttling
  • Clock shutdown
  • Card shutdown

Clock throttling protection reduces the kernel clock frequencies when any sensor reaches or exceeds their respective clock throttling threshold as listed in the following table. It is a dynamic process that lowers the clock frequencies while power exceeds the associated threshold. By lowering the clock frequencies, clock throttling reduces the required power and subsequently generated heat. Only when all sensor values fall below their respective clock throttling threshold values will the application clocks be restored to full performance.

Clock shutdown shuts down the kernel clocks when any sensor reaches or exceeds their respective clock shutdown threshold given in the following table and will cause an AXI firewall trip that can crash the application on the host. Because the card ends up in an unknown state the XRT driver will issue a command to reset the card. It typically takes a couple minutes until the card is usable again.

Card shutdown removes power to the FPGA when any sensor reaches or exceeds their respective shutdown threshold and will pull the card off the PCIe bus. Power to the SC will remain on. No AXI firewall trip will be issued. A cold reboot of the server is required to recover. The shutdown thresholds listed in the following table are higher than the clock shutdown thresholds and protect the card from damage.

Tip: Review the Linux dmesg command output to determine if a protection was activated. An example of the clock shut down messaging is shown:

[ 777.531353] clock.m clock.m.23068673: dev ffff97a9e5c3c810, clock_status_check: Critical temperature or power event, kernel clocks have been stopped.  
Table 2. Thermal and Electrical Protection Thresholds
Sensor Description Clock Throttling Threshold Clock Shutdown Threshold Shutdown Threshold
12V PEX power 62W 65W N/A
3V3 PEX power 9.9W 11W N/A
VCCINT current 56,000 mA N/A 60,000 mA
VCCINT temperature 105°C 110°C 125°C
Maximum temperature of device and HBM 92°C 97°C 107°C
QSFP temperature N/A 85°C 1 90°C 1
  1. Refer to QSFP module data sheet.

Clocking

The platform provides a 300 MHz default clock to run the accelerator.

Available Resources After Platform Installation

The following table lists the available resources in the dynamic region of each SLR. It represents the total device resources after subtracting those used by the static region.

Table 3. xilinx_u50_gen3x16_xdma_5_202210_1 Platform Resource Availability Per SLR
Resource SLR0 SLR1
CLB LUT 351K 353K
CLB register 703K 707K
Block RAM tile 552 564
UltraRAM 272 272
DSP 2352 2568

Deployment Platform Installation

To run applications with this platform, download the deployment installation packages corresponding to your OS listed in the following table. Then, use the installation procedures described in Alveo U50 Data Center Accelerator Card Installation Guide (UG1370).

Accelerated applications have software dependencies. Work with your accelerated application provider to determine which XRT version to install.

Development Platform Installation

For developing applications for use with the Alveo Data Center accelerator cards you must install and use the Vitis software platform. To set up an accelerator card for use in the development environment, follow the installation steps in: