There are 13 VFPMAC
s in the generated assembly code: six for each even and odd column and another for summing the final accumulator results. The VFPMAC
instructions are not as tightly packed. That is, some VFPMAC
s have other instructions between them. There are two sections where the 13 VFPMACs
occur, effectively halving the number of iterations in the outer loop.