Fixed-point data types model the data as an integer and fraction bits in
two's complement with the format
ap_fixed<W,I,[Q,O,N]> as explained in the table below. In the
following example, the Vitis HLS
ap_fixed type is used to define an 18-bit variable with
6 bits (including the sign bit) specified as representing the numbers above the binary
point, and 12 bits implied to represent the fractional value after the decimal point.
The variable is specified as signed and the quantization mode is set to round to plus
infinity. Because the overflow mode is not specified, the default wrap-around mode is
used for overflow.
#include <ap_fixed.h> ... ap_fixed<18,6,AP_RND> t1 = 1.5; // internally represented as 0b00'0001.1000'0000'0000 (0x01800) ap_fixed<18.6,AP_RND> t2 = -1.5; // 0b11'1110.1000'0000'0000 (0x3e800) ...
When performing calculations where the variables have different number of bits or different precision, the binary point is automatically aligned. For example, when performing division with fixed-point type variables of different sizes, the fraction of the quotient is no greater than that of the dividend. To preserve the fractional part of the quotient you can cast the result to the new variable width before assignment.
The behavior of the C++ simulations performed using fixed-point matches the resulting hardware. This allows you to analyze the bit-accurate, quantization, and overflow behaviors using fast C-level simulation.
Fixed-point types are a useful replacement for floating point types which require many clock cycle to complete. Unless the entire range of the floating-point type is required, the same accuracy can often be implemented with a fixed-point type resulting in the same accuracy with smaller and faster hardware.
A summary of the
ap_fixed type identifiers
is provided in the following table.
|W||Word length in bits|
|I||The number of bits used
to represent the integer value, that is, the number of integer bits to
the left of the binary point. When
this value is negative, it represents the number of implicit sign bits (for signed
representation), or the number of implicit zero bits (for unsigned representation) to the
right of the binary point. For
|Q||Quantization mode: This dictates the behavior when greater precision is generated than can be defined by smallest fractional bit in the variable used to store the result.|
|AP_RND||Round to plus infinity|
|AP_RND_ZERO||Round to zero|
|AP_RND_MIN_INF||Round to minus infinity|
|AP_RND_INF||Round to infinity|
|AP_TRN||Truncation to minus infinity (default)|
|AP_TRN_ZERO||Truncation to zero|
Overflow mode: This dictates the behavior when the result of an operation exceeds the maximum (or minimum in the case of negative numbers) possible value that can be stored in the variable used to store the result.
|AP_SAT_ZERO 1||Saturation to zero|
|AP_SAT_SYM 1||Symmetrical saturation|
|AP_WRAP||Wrap around (default)|
|AP_WRAP_SM||Sign magnitude wrap around|
|N||This defines the number of saturation bits in overflow wrap modes.|
The default maximum width allowed for
ap_[u]fixed data types is 1024 bits. This default may be overridden by
defining the macro
AP_INT_MAX_W with a positive integer
value less than or equal to 4096 before inclusion of the ap_int.h header file.
ap_[u]fixed. Changing it to
intresults in a quicker synthesis. For example:
static ap_fixed<32,0> a[depth] =
Can be changed to:
static int a[depth] =