Finite Word Length Considerations - 9.1 English

PG109 Fast Fourier Transform LogiCORE IP Product Guide

Document ID
PG109
Release Date
2022-05-04
Version
9.1 English

The Burst I/O architectures process an array of data by successive passes over the input data array. On each pass, the algorithm performs Radix-4 or Radix-2 butterflies, where each butterfly picks up four or two complex numbers, respectively, and returns four or two complex numbers to the same memory. The numbers returned to memory by the core are potentially larger than the numbers picked up from memory. A strategy must be employed to accommodate this dynamic range expansion. A full explanation of scaling strategies and their implications is beyond the scope of this document; for more information about this topic; see A Simple Fixed-Point Error Bound for the Fast Fourier Transform [Ref 3] and Theory and Application of Digital Signal Processing [Ref 4] .

For a Radix-4 DIT FFT, the values computed in a butterfly stage can experience growth by a factor of up to pg109-designing00022.jpg . This implies a bit growth of up to 3 bits.

For Radix-2, the growth is by a factor of up to inset_000024.jpg . This implies a bit growth of up to 2 bits. This bit growth can be handled in three ways:

Performing the calculations with no scaling and carrying all significant integer bits to the end of the computation

Scaling at each stage using a fixed-scaling schedule

Scaling automatically using block floating-point

All significant integer bits are retained when using full-precision unscaled arithmetic. The width of the datapath increases to accommodate the bit growth through the butterfly. The growth of the fractional bits created from the multiplication are truncated (or rounded) after the multiplication. The width of the output is (input width + log 2 (transform length) + 1). This accommodates the worst case scenario for bit growth.

Consider an unscaled Radix-2 DIT FFT: the datapath in each stage must grow by 1 bit as the adder and subtracter in the butterfly might add/subtract two full-scale values and produce a sample which has grown in width by 1 bit. This yields the log 2 (transform length) part of the increase in the output width relative to the input width. The complex multiplier preserves the magnitude of an input (as it applies a rotation on the complex plane), but can theoretically produce bit-growth when the magnitude of the input is greater than 1 (for example, 1+j has a magnitude of 1.414). This means that the complex multiplier bit growth must only be considered once in the entire FFT process, yielding the additional +1 increase in the output width relative to the input width. For example, a 1024-point transform with an input of 16 bits consisting of 1 integer bit and 15 fractional bits has an output of 27 bits with 12 integer bits and 15 fractional bits. Note that the core does not have a specific location for the binary point. The output maintains the same binary point location as the input. For the preceding example, a 16-bit input with 3 integer bits and 13 fractional bits would have an unscaled output of 27 bits with 14 integer bits and 13 fractional bits.

When using scaling, a scaling schedule is used to divide by a factor of 1, 2, 4, or 8 in each stage. If scaling is insufficient, a butterfly output might grow beyond the dynamic range and cause an overflow. As a result of the scaling applied in the FFT implementation, the transform computed is a scaled transform. The scale factor s is defined as

Equation 3-1 pg109-designing00026.jpg

where b i is the scaling (specified in bits) applied in stage i .

The scaling results in the final output sequence being modified by the factor 1/s . For the forward FFT, the output sequence X’ (k), k = 0,...,N - 1 computed by the core is defined as

Equation 3-2 inset_100028.jpg

For the inverse FFT, the output sequence is

Equation 3-3 inset_200030.jpg

If a Radix-4 algorithm scales by a factor of 4 in each stage, the factor of 1/s is equal to the factor of 1/N in the inverse FFT equation ( This Equation ). For Radix-2, scaling by a factor of 2 in each stage provides the factor of 1/N .

With block floating-point, each stage applies sufficient scaling to keep numbers in range, and the scaling is tracked by a block exponent.

As with unscaled arithmetic, for scaled and block floating-point arithmetic, the core does not have a specific location for the binary point. The location of the binary point in the output data is inherited from the input data and then shifted by the scaling applied.