Components Available in Versal Devices - 2023.1 English

Versal Adaptive SoC System Software Developers Guide (UG1304)

Document ID
Release Date
2023.1 English

The following list describes the largest hardware view components that are only available in Versal devices.

Figure 1. Device-level Interconnect Architecture

The application processing unit (APU) consists of Cortex-A72 processor cores, L1/L2 caches, and related functionality. The Cortex-A72 cores and caches are part of Arm MPCore IP.

Versal devices use a dual-core Cortex-A72 processor system with 1 MB L2 cache. The Cortex-A72 cores implement Armv8 64-bit architecture. The Cortex-A72 MPCore does not have integrated generic interrupt controller (GIC), so an external GIC IP is used. For more information, refer to APU Processing Unit section in Versal Adaptive SoC Technical Reference Manual (AM011).

NoC Interconnect
The NoC is the device-level interconnect and contains a vertical component (VNoC) and a horizontal component (HNoC).
  • HNoC is integrated in the horizontal super row/region (HSR). The HSR includes blocks such as XPIO, hard DDR memory controller, PLL, HBM, and AI Engine.
  • VNoC integration includes the global-clk-column. In SSI technology, VNoCs are connected across super logic region (SLR) boundaries. Microbumps and buffers for this reside in the Thin-HNoC. Configuration data between primary and secondary SLRs travels over the NoC.
The interconnect for cache coherent interconnect for accelerators (CCIX) and PCIe® (CPM) module is the primary PCIe interface for the processing system. There are two integrated blocks for PCIe in the CPM, supporting up to Gen4 x16. You can configure both of the integrated blocks for PCIe as an endpoint. Furthermore, you can configure each integrated block as a root port that contains direct memory access (DMA) controller. The CPM CCIX functionality allows a PL accelerator to act as a CCIX compliant accelerator.
AI Engine
The AI Engine contains a scalar unit, a vector unit, load units, and a memory interface. The scalar unit contains a 32-bit scalar RISC processor with register files for general purpose, pointer, configuration, and backup registers, and a 32x32-bit scalar multiplier. The AI Engine also supports non-linear functions including sine/cosine, squareroot, and inverse-squareroot. Three address generator units (AGUs) are available: two dedicated as load units, and one dedicated as a store unit. The vector unit contains a 512-bit vector fixed-point / integer unit. Devices with AI Engines contain a single-precision floating point vector unit. Devices with an AI Engine-ML contain a fixed-point vector unit also used for Bfloat16 and FP32 support. The vector units in both the AI Engine and AI Engine-ML support concurrent operation on multiple vector lanes.

Within each AI Engine is a dedicated, single-port, 16 KB program memory 128-bit wide and 1k deep. The program memory supports instruction compression and has ECC protection and reporting.

The real-time processing unit (RPU) is a dual-core Cortex-R5F processor, based on the Armv7-R architecture with a floating point unit, which can run as either two independent cores or in a lock-step configuration. For more information, refer to Platform Management in Versal Adaptive SoC Technical Reference Manual (AM011).
Cache Coherent Interconnect
The cache coherent interconnect (CCI) is based on the Arm CCI-500 with its snoop filter (SF) table feature. It provides tight memory coherency between the APU L2 cache and a PL system cache using the ACE interface protocol to support multiple heterogeneous processing environments. It is part of the FPD interconnect.

For more information on the CCI, see Chapter 44 in the Versal Adaptive SoC Technical Reference Manual (AM011).

Tightly coupled memory (TCM) in the RPU
This memory is 256 KB and is mainly used by the RPU but can be accessed by the APU.
On-chip memory (OCM) in the PS
This memory is 256 KB in size, and is accessible to the RPU and APU processors via the LPD OCM interconnect switch.