Rounding

Versal ACAP DSP Engine Architecture Manual (AM004)

Document ID
AM004
Release Date
2022-09-11
Revision
1.2.1 English

Arithmetic rounding is a process where a result is quantized in an intelligent manner. Given a choice, one would like to use an implementation that minimizes the loss of precision. However, in most cases of hardware implementation, including ones with Xilinx DSPs, one has to be aware of the overheads associated with the various rounding techniques to make appropriate design trade-offs. While the binary point placement and bit position where rounding occurs are independent of each other, it is assumed that the designer’s goal is to round off the fractional bits to an integer value.

One form of rounding is simple truncation or dropping undesired LSBs from a large result to obtain a reduced number of result bits. The problem with truncation happens after the bits are dropped and the new reduced result has an undesirable DC data shift toward a more negative number. For example, if a number has the decimal value 2.8 and the fractional part of the number is truncated, then the result is two. In this example, the original number is closer to 3 than to 2 and a rounded result of 3 is more desirable than the simple truncated result of 2.

In the next few sections, other methods of quantization with a more desirable effect, including symmetric rounding and convergent rounding are discussed.