Architectural Highlights of DSP58

Versal ACAP DSP Engine Architecture Manual (AM004)

Document ID
Release Date
1.2.1 English

The DSP58 contains a pre-adder after the A and B registers with a 27-bit input vector called D. The D register can be used either as the pre-adder register or an alternate input to the multiplier. The DSP58 specific features are highlighted in the following figure.

Figure 1. Hierarchical View of the DSP58 Input Registers and Pre-Adder

Each DSP58 has a two-input multi-mode multiplier followed by multiplexers and a four-input adder/subtracter/accumulator. The DSP58 multiplier has asymmetric inputs and accepts a 24-bit two’s complement operand and a 27-bit two’s complement operand. The multiplier stage produces a 51-bit two’s complement result in the form of two partial products. These partial products are sign-extended to 58 bits in the X multiplexer and Y multiplexer and fed into four-input adder for final summation. Therefore, when the multiplier is used, the adder effectively becomes a three-input adder.

The second stage adder/subtracter accepts four 58-bit, two’s complement operands plus 1-bit CARRYIN and produces a 58-bit, two’s complement result when the multiplier is bypassed by setting USE_MULT attribute to NONE and with the appropriate OPMODE setting. In SIMD mode, the adder/subtracter also supports dual 24-bit or quad 12-bit SIMD arithmetic operations with CARRYOUT bits. In the second stage adder/subtracter, bitwise logic operations on two 58-bit binary numbers (and three 58-bit binary numbers in the special XOR3 case) are also supported with dynamic ALUMODE control signals.

Higher level DSP functions are supported by cascading individual DSP58s in a DSP column. Two datapaths (ACOUT and BCOUT) and the DSP58 outputs (PCOUT, MULTSIGNOUT, and CARRYCASCOUT) provide the cascade capability. The ability to cascade datapaths is useful in filter designs. For example, a finite impulse response (FIR) filter design can use the cascading inputs to arrange a series of input data samples and the cascading outputs to arrange a series of partial output results. The ability to cascade provides a high-performance and low-power implementation of DSP filter functions because the general routing in the internal logic is not used.

The C input allows the formation of many 3-input mathematical functions, such as 3-input addition or 2-input multiplication with an addition. One subset of this function is the valuable support of symmetrically rounding a multiplication toward zero or toward infinity. The C input together with the pattern detector also supports convergent rounding. Refer to Rounding for a discussion on using the C input to implement the different rounding modes.

For multi-precision arithmetic, DSP58 provides a right wire shift by 23 bits (or 17 bits when migrating from the UltraScale™ architecture). Thus, a partial product from one DSP58 can be right justified and added to the next partial product computed in an adjacent DSP58 above it in the same column. Using this technique, the DSP58s in a column can be used to build higher-precision multipliers.

Programmable pipelining of input operands, intermediate products, and accumulator outputs enhances throughput. The 58-bit internal bus (PCOUT/PCIN) allows for aggregation of DSPs in a single column. CLB logic is needed when spanning multiple DSP columns.

The pattern detector at the output of the DSP58 provides support for convergent rounding, overflow/underflow, block floating-point, and support for accumulator terminal count (counter auto reset). The pattern detector can detect if the output of the DSP58 matches a pattern, as qualified by a mask.