As there is no post-add lane reduction hardware in the floating-point pipeline of the AI Engine, all outputs will always be on eight lanes (float
) or four lanes (cfloat
). This means that we can compute eight (four) lanes in parallel, each time with a single coefficient, using fpmul
and then fpmac
for all the coefficients, one by one.
The floating-point accumulator has a latency of two clock cycles, so two fpmac
instructions using the same accumulator cannot be used back to back, but only every other cycle. Code can be optimized by using two accumulators, used in turn, that get added at the end to get the final result.
Navigate to the
FIRFilter
directory.Type
make all
in the console and wait for completion of the three following stages:aie
aiesim
aieviz
The last stage is opening vitis_analyzer
that will allow you to visualize the graph of the design and the simulation process timeline.
In this design you learned:
How to use real floating-point data and coefficients in FIR filters.
How to handle complex floating-point data and complex floating-points coefficients in FIR filters.
How to organize the compute sequence.
How to use:
fpmul
,fpmac
, andfpadd
in the real and complex case.