Configuration for performance vs resource - 2023.2 English

Vitis Libraries

Release Date
2023-12-20
Version
2023.2 English

Simple configurations of the FFT use a single kernel. Multiple kernels will be used when either TP_PARALLEL_POWER > 0 or TP_CASC_LEN > 1 or TP_USE_WIDGETS = 1 (with stream IO or TP_PARALLEL_POWER>0). All of these parameters exist to allow higher throughput, though TP_PARALLEL_POWER also allows larger point sizes that can be implemented in a single kernel. If a higher throughput is required than what can be achieved with a single kernel then TP_CASC_LEN should be increased in preference to TP_PARALLEL_POWER. This is because resource (number of kernels) will match TP_CASC_LEN, whereas for TP_PARALLEL_POWER, resource increases quadratically. It is recommended that TP_PARALLEL_POWER is only increased after TP_CASC_LEN has been increased, but where throughput still needs to be increased. Of course, TP_PARALLEL_POWER may be required if the point size required is greater than a single kernel can be achieved. In this case, to keep resource minimized, increase TP_PARALLEL_POWER as required to support the point size in question, then increase TP_CASC_LEN to achieve the required throughput, before again increasing TP_PARALLEL_POWER if higher throughput is still required. TP_USE_WIDGETS can be used when either TP_API = 1 (stream IO to the FFT) or TP_PARALLEL_POWER>0 because for this, widgets are used with streams for the internal trellis connections. By default, TP_USE_WIDGETS=0 which means that the FFT with stream input will convert the incoming stream(s) to an iobuffer as a function of the FFT kernel and similarly for the output iobuffer to streams conversion. Setting TP_USE_WIDGETS=1 will mean that these conversion functions are separate kernels. The use of runtime<ratio> or location constraints can then be used to force these widget kernels to be placed on different tiles to their parent FFT kernel and so boost performance. The resource cost will rise accordingly. If the performance achieved using TP_CASC_LEN alone is close to that required (e.g. within about 20%), then TP_USE_WIDGETS may help reach the required performance with only a modest increase in resources and so should be used in preference to TP_PARALLEL_POWER. Using TP_USE_WIDGETS in conjunction with TP_PARALLEL_POWER>0 can lead to a significant increase in tile use (up to 3x). The maximum point size supported by a single kernel may be increased by use of the single_buffer constraint. This only applies when TP_API=0 (windows) as the streaming implementation always uses single buffering.