DSP58 Features

Versal ACAP DSP Engine Architecture Manual (AM004)

Document ID
AM004
Release Date
2022-09-11
Revision
1.2.1 English

Features of DSP58 are as follows.

  • Backward compatibility with DSP48E2
  • 27-bit pre-adder with D register to enhance the capabilities of the A or B path
  • A or B can be selected as pre-adder input to allow for wider multiplication coefficients
  • The result of the pre-adder can be sent to both inputs of the multiplier to provide squaring capability
  • INMODE control supports balanced pipelining when dynamically switching between multiply (A*B) and add operations (A+B) for fixed point, non-complex numbers
  • 27 × 24 two’s complement multiplier with optional product negation
  • 34-bit A input of which the lower 27 bits feed the A input of the multiplier, and the entire 34-bit input forms the upper 34 bits of the 58-bit A:B concatenated internal bus
  • Cascading A and B input:
    • Semi-independently selectable pipelining between direct and cascade paths
    • Separate clock enables for two-deep A and B set of input registers
  • Independent 58-bit C input and C register with independent reset and clock enable
  • CARRYCASCIN and CARRYCASCOUT internal cascade signals to support 116-bit accumulators/adders/subtracters in two DSP58s, and to support cascading more than two DSP58s
  • MULTSIGNIN and MULTSIGNOUT internal cascade signals with special OPMODE setting to support a 116-bit MACC extension
  • Single instruction multiple data (SIMD) mode for four-input adder/subtracter, which precludes the use of multiplier in first stage:
    • Dual 24-bit SIMD adder/subtracter/accumulator with two separate CARRYOUT signals
    • Quad 12-bit SIMD adder/subtracter/accumulator with four separate CARRYOUT signals
  • 58-bit logic unit:
    • Bitwise logic operations—two-input AND, OR, NOT, NAND, NOR, XOR, and XNOR
    • Logic unit mode dynamically selectable through ALUMODE and OPMODE[3:2]
  • 116-bit wide XOR selectable for XOR12, XOR22 (new), XOR24, XOR34 (new), XOR58 (new), and XOR116 (new)
    Note: XOR48 and XOR96 are supported when migrating from the UltraScale™ architecture.
  • Pattern detector:
    • Overflow/underflow support
    • Convergent rounding support
    • Terminal count detection support and auto resetting: auto resetting can give priority to clock enable
  • Cascading 58-bit P bus supports internal low-power adder cascade: 58-bit P bus allows for 12-bit quad or 24-bit dual SIMD adder cascade support
  • 23-bit right shift to enable wider multiplier implementation, 17-bit right shift is supported when migrating from the UltraScale architecture
  • Dynamic user-controlled operating modes:
    • 9-bit OPMODE control bus provides W, X, Y, and Z multiplexer select signals
    • 5-bit INMODE control bus provides selects for 2-deep A and B registers, pre-adder add-sub control as well as mask gates for pre-adder multiplexer functions.
    • 1-bit NEGATE control bit to conditionally negate the multiplier product
    • 4-bit ALUMODE control bus selects logic unit function and accumulator add-sub control
  • Carry in for the second stage adder:
    • Support for rounding
    • Support for wider add/subtracts
    • 3-bit CARRYINSEL multiplexer
  • Carry out for the second stage adder:
    • Support for wider add/subtracts
    • Available for each SIMD adder (up to four)
    • Cascaded CARRYCASCOUT and MULTSIGNOUT allows for MACC extensions up to 116 bits
  • Single clock for synchronous operation
  • Optional input, pipeline, and output/accumulate registers
  • Optional registers for control signals (OPMODE, ALUMODE, and CARRYINSEL)
  • Independent clock enable and synchronous resets with programmable polarity for greater flexibility
  • Internal multiplier and XOR logic can be gated off when unused to save power

DSP58 consists of a multiplier followed by an accumulator. At least three pipeline registers are required for both multiply and multiply-accumulate operations to run at full speed. The multiply operation in the first stage generates two partial products that need to be added together in the second stage.

When only one or two registers exist in the multiplier design, the M register should always be used to save power and improve performance.

Add/Sub and logic unit operations require at least two pipeline registers (input, output) to run at full speed.

The cascade capabilities of DSP58 are extremely efficient at implementing high-speed pipelined filters built on the adder cascades instead of adder trees.

Multiplexers are controlled with dynamic control signals, such as OPMODE, ALUMODE, and CARRYINSEL, enabling a great deal of flexibility. Designs using registers and dynamic opmodes are better equipped to take advantage of the DSP58’s capabilities than combinatorial multiplies.

In general, the DSP58 supports both sequential and cascaded operations due to the dynamic OPMODE and cascade capabilities. Fast Fourier transforms (FFTs), floating-point, computation (multiply, add/sub, and divide), counters, and large bus multiplexers are some applications of DSP58.

Additional capabilities of the DSP58 include synchronous resets and clock enables, dual A input pipeline registers, pattern detection, Logic Unit functionality, single instruction/multiple data (SIMD) functionality, and MACC and Add-Acc extension to 116 bits. The DSP58 supports convergent and symmetric rounding, terminal count detection and auto-resetting for counters, and overflow/underflow detection for sequential accumulators. Up to a 116-bit wide XOR function can be implemented as six 12-bit and two 22-bit wide XOR, two 24-bit and two 34-bit wide XOR, or two 48/58-bit wide XOR.