One major difference between the novel filter design and the traditional method is that samples for MAC operations are read from the overlap buffer instead of the data buffer. For every eight output results, it only takes one read operation in the input window, and all the other data are from the overlap buffer. During the MAC operations, the newly read eight input data are written to the overlap memory for the next iteration. Every overlap buffer has three pointers, a read pointer, a symmetry pointer, and a write pointer. The starting locations of the overlap buffer pointers can be different in each iteration depending on the size of input window.
In the case of FIR89, an overlap of 80 samples depth is needed. The following
figure illustrates the behavior of each pointer. At first the read pointer points to
address 0, the symmetry pointer points to the address, overlap-depth
- 8, the write pointer points to address, overlap-size
, and the input window pointer points to the
beginning of the input widow. In each iteration, the read pointer and symmetry pointer
move against each other in step sizes of eight samples until all the data in the delay
line is processed. At the beginning of the next iteration all the pointers are reset to
their initial locations with an offset of eight samples relative to the initial location
of the previous iteration.
cyclic_add()
function for the
pointer update.At the beginning of the second kernel execution, the pointer locations should be initialized to 8/2/3 respectively and then the locations will be 0/10/11 again at the beginning of the third execution. This pattern keeps repeating as the data processing continues.
The Versal
AI Engine software tools support a function called
cyclic_add
in cardano.h. It can be used to implement the cyclic roll-over of the
pointers for when the pointers reach the end of the buffer. For example, the following
code defines an inline function of cyclic increase to construct a buffer of depth,
16 × v8cint16.
struct buffer_internal
{
buffer_datatype * restrict head;
buffer_datatype * restrict ptr
}
inline __attribute__((always_inline)) void buffer128_incr_v8(buffer_internal * w, int count) {
w->ptr=cyclic_add(w->ptr, count, w->head, 16);
}
-
w
represents the overlap structure instance. -
w->ptr
is the current pointer to the overlap. -
w->head
refers to the starting address of the overlap. -
count
means how many steps(v8cint16) to increase. - The constant, 16, means it is an overlap with a fixed 128 sample depth (16 × v8cint16).
The following figure shows the microcode of FIR89. It is observed that the inner loop is perfect, and every cycle of the inner loop contains a MAC operation as indicated in the green box. As indicated in the blue box, the overlap buffer update operation (VST in microcode) is absorbed by the cycle that also performs the MAC operation.