Overview of Arbitrary Precision Fixed-Point Data Types - 2021.2 English

Vitis High-Level Synthesis User Guide (UG1399)

Document ID
English (United States)
Release Date
2021.2 English

Fixed-point data types model the data as an integer and fraction bits with the format ap_fixed<W,I,Q> as explained in the table below. In the following example, the Vitis HLS ap_fixed type is used to define an 18-bit variable with 6 bits specified as representing the numbers above the binary point, and 12 bits implied to represent the value after the decimal point. The variable is specified as signed and the quantization mode is set to round to plus infinity. Because the overflow mode is not specified, the default wrap-around mode is used for overflow.

#include <ap_fixed.h>
ap_fixed<18,6,AP_RND > my_type;

When performing calculations where the variables have different number of bits or different precision, the binary point is automatically aligned.

The behavior of the C++ simulations performed using fixed-point matches the resulting hardware. This allows you to analyze the bit-accurate, quantization, and overflow behaviors using fast C-level simulation.

Fixed-point types are a useful replacement for floating point types which require many clock cycle to complete. Unless the entire range of the floating-point type is required, the same accuracy can often be implemented with a fixed-point type resulting in the same accuracy with smaller and faster hardware.

A summary of the ap_fixed type identifiers is provided in the following table.

Table 1. Fixed-Point Identifier Summary
Identifier Description
W Word length in bits
I The number of bits used to represent the integer value, that is, the number of integer bits to the left of the binary point. When this value is negative, it represents the number of implicit sign bits (for signed representation), or the number of implicit zero bits (for unsigned representation) to the right of the binary point. For example,
ap_fixed<2, 0> a = -0.5;    // a can be -0.5,

ap_ufixed<1, 0> x = 0.5;    // 1-bit representation. x can be 0 or 0.5
ap_ufixed<1, -1> y = 0.25;  // 1-bit representation. y can be 0 or 0.25
const ap_fixed<1, -7> z = 1.0/256;  // 1-bit representation for z = 2^-8
Q Quantization mode: This dictates the behavior when greater precision is generated than can be defined by smallest fractional bit in the variable used to store the result.
ap_fixed Types Description
AP_RND Round to plus infinity
AP_RND_ZERO Round to zero
AP_RND_MIN_INF Round to minus infinity
AP_RND_INF Round to infinity
AP_RND_CONV Convergent rounding
AP_TRN Truncation to minus infinity (default)
AP_TRN_ZERO Truncation to zero

Overflow mode: This dictates the behavior when the result of an operation exceeds the maximum (or minimum in the case of negative numbers) possible value that can be stored in the variable used to store the result.

ap_fixed Types Description
AP_SAT 1 Saturation
AP_SAT_ZERO 1 Saturation to zero
AP_SAT_SYM 1 Symmetrical saturation
AP_WRAP Wrap around (default)
AP_WRAP_SM Sign magnitude wrap around
N This defines the number of saturation bits in overflow wrap modes.
  1. Using the AP_SAT* modes can result in higher resource usage as extra logic will be needed to perform saturation and this extra cost can be as high as 20% additional LUT usage.

The default maximum width allowed for ap_[u]fixed data types is 1024 bits. This default may be overridden by defining the macro AP_INT_MAX_W with a positive integer value less than or equal to 4096 before inclusion of the ap_int.h header file.

Important: ROM Synthesis can take a long time when using APFixed:. Changing it to int results in a quicker synthesis. For example:

static ap_fixed<32> a[32][depth] =

Can be changed to:

static int a[32][depth] =