Detailed Implementation Tab - 9.1 English

PG109 Fast Fourier Transform LogiCORE IP Product Guide

Document ID
PG109
Release Date
2022-05-04
Version
9.1 English

Memory Options :

° Data And Phase Factors (Burst I/O architectures) : For Burst I/O architectures, either block RAM or distributed RAM can be used for data and phase factor storage. Data and phase factor storage can be in distributed RAM for all point sizes up to and including 1024 points.

° Data And Phase Factors (Pipelined Streaming I/O) : In the Pipelined Streaming I/O solution, the data can be stored partially in block RAM and partially in distributed RAM. Each pipeline stage, counting from the input side, uses smaller data and phase factor memories than preceding stages. You can select the number of pipeline stages that use block RAM for data and phase factor storage. Later stages use distributed RAM. The default displayed on the IDE offers a good balance between both. If output ordering is Natural Order, the memory used for the reorder buffer can be either block RAM or distributed RAM. The reorder buffer can use distributed RAM for point sizes less than or equal to 1024.

- When block floating-point is selected for the Pipelined Streaming I/O architecture, a RAM buffer is required for natural order and bit reversed order output data. In this case, the reorder buffer options remain available and distributed RAM can be selected for all point sizes below 2048.

° Hybrid Memories : Where data, phase factor, or reorder buffer memories are stored in block RAM, if the size of the memory is greater than one block RAM, the memory can be constructed from a hybrid of block RAMs and distributed RAM, where the majority of the data is stored in block RAMs and a few bits that are left over are stored in distributed RAM. This Hybrid Memory is an alternative to constructing the memory entirely from multiple block RAMs. It provides a reduction in the block RAM count, at the cost of an increase in the number of slices used. Hybrid Memories are only available when block RAM is used for one or more memories and the number of slices required for a Hybrid Memory implementation is below an internal threshold of 256 LUTs per memory. If these conditions are met, Hybrid Memories are made available and can be selected.

Optimize Options :

° Complex Multipliers : Three options are available for customization of the complex multiplier implementation:

- Use CLB logic : All complex multipliers are constructed using slice logic. This is appropriate for target applications that have low performance requirements, or target devices that have few DSP slices.

- Use 3-multiplier structure (resource optimization) : All complex multipliers use a three real multiply, five add/subtract structure, where the multipliers use DSP slices. This reduces the DSP slice count, but uses some slice logic. This structure can make use of the DSP slice pre-adder to reduce or remove the need for extra slice logic, and improve performance.

- Use 4-multiplier structure (performance optimization) : All complex multipliers use a four real multiply, two add/subtract structure, utilizing DSP slices. This structure yields the highest clock performance at the expense of more dedicated multipliers. In devices with DSP slices, the add/subtract operations are implemented within the DSP slices.

Note: The core might override the complex multiplier implementation internally to ensure the fewest number of DSP slices are used, without impacting performance. For this reason, some core configurations might show no difference in DSP slice usage when toggling between the 3-multiplier and 4-multiplier options. If Use CLB logic is selected, however, slice logic is always used.

° Butterfly Arithmetic : Two options are available for customization of the butterfly implementation:

- Use CLB logic : All butterfly stages are constructed using slice logic.

- Use XtremeDSP Slices: This option forces all butterfly stages to be implemented using the adder/subtracters in DSP slices.