A finite impulse response (FIR) filter is described by the following equation, where x denotes the input, C denotes the coefficients, y denotes the output, and N denotes the length of the filter.
Following is an example of a 32-tap filter.
Each output takes 32 multiplications. If you take
cint16 as the data type and coefficient type, it takes 4 cycles to
compute a sample in a kernel, since each AI Engine can perform 8 MAC operations a
cycle. If data is streaming from one stream port (32 bits), one data can produce one
output (in the middle of processing).
So, the design is compute bound. You will see how to split the kernel into 4 cascaded kernels to process one sample per cycle.