Features and Functional Modes

Versal ACAP DSP Engine Architecture Manual (AM004)

Document ID
AM004
Release Date
2022-09-11
Revision
1.2.1 English

The DSP Engine can operate in a number of functional modes. Some highlights of the functionality include:

  • 27 × 24 + 58 two’s complement multiply-accumulator with 27-bit pre-addition and optional product negation.
  • 18 × 18 + 58 two’s complement complex multiply-accumulator using two back-to-back DSP58s, each of the two complex inputs can be optionally conjugated.
  • Single-precision floating-point (binary32) accumulation.
  • Mixed-precision floating-point multiply-accumulator with multiplicand and multiplier statically and independently selectable to be either binary16 or binary32, and binary32 biasing and accumulation.
  • Three-element two’s complement vector dot product with accumulate or post-add in INT8 mode.
  • Power saving 27-bit pre-adder that optimizes symmetrical filter applications and reduces DSP logic requirements.
  • 58-bit accumulator that can be cascaded to build 116-bit and larger accumulators, adders, and counters.
  • Single-instruction-multiple-data (SIMD) arithmetic unit with dual 24-bit or quad 12-bit add/subtract/accumulate.
  • 58-bit logic unit: bitwise AND, OR, NOT, NAND, NOR, XOR, and XNOR.
  • Pattern detector: terminal counts, overflow/underflow, convergent/symmetric rounding support, and 116-bit wide AND/NOR when combined with logic unit to detect if output matches a pattern.
  • Optional pipeline registers and dedicated buses for cascading multiple DSP58s in a column for hierarchical/composite functions such as systolic FIR filters.
In the UltraScale™ architecture, two DSP48E2s with configurable logic blocks (CLBs) and block RAM form the DSP48 tile. The Versal® architecture introduces the DSP58 supertile (see the following figure) which is made up of two rows and two columns of the new version of configurable logic block (CLBs) always next to a DSP58 to provide:
  • 64 LUTMs: LUTs to be used as logic, distributed memory, or shift-register logic (SRL)
  • 64 LUTLs to be used as logic resources
  • 256 flip-flops

The new CLB contains exactly 50% LUTRAM/SRL capable LUTs to provide single port distributed SRAMs to the DSP. This structure can be replicated through the device to maximize ease of timing closure.

Figure 1. Two Back-to-Back DSP58 Supertiles

The two back-to-back DSP58s form one complex arithmetic unit with their DSP_MODE attributes set to CINT18 (see the following figure). The right DSP58 in a dual-DSP58 complex arithmetic unit computes the real result PRE. Concurrently, the left DSP58 computes the imaginary result PIM. Shared signals (for example, CLK and ASYNC_RST) are routed only to the interconnect interface of the left DSP58.

Figure 2. One DSP58 Complex Mode Supertile

Details on the various functional modes are provided in the following chapters.