The following figures show MAC with int8 `X`

buffer and int8 `Z`

buffer. The first
figure shows how data is permuted and the second figure shows how coefficients are
permuted. Note that the permute granularity for `X`

buffer and `Z`

buffer are 32 bits and 16 bits,
respectively. The `xoffsets`

parameter comes in pair.
The first hex value is an absolute 32 bits offset and pick up 4 x 8 bits values
(index, index+1, index+2, index+3). The second hex value is offset from the first
value + 1 (32 bits offset) and picks up 4 x 8 bits values. For example, `0x00`

selects index 0, 1, 2, 3 as well as 4, 5, 6, 7,
and `0x24`

selects index 16, 17, 18, 19 as well as
28, 29, 30, 31.

There is another `xsquare`

parameter
to do 8 bits granularity twiddling after main permute. How `xsquare`

parameter works in this example can be seen in the center of
the following figure.

The `start`

(`xstart`

, `zstart`

) and `step`

(`xstep`

, `zstep`

) parameters are always in terms of data type
granularity. Hence, a value of 2 for 16 bits is 2 * 16 bits away, while a value of 2
for 8 bits is 2 * 8 bits away. The `step`

parameter
applies to the next block of selected data. So, if a pair of `offset`

parameters select a 2 * 2 block, the step applies to the next 2
* 2 block. The step added to the index value must be aligned to the permute
granularity (32 bits for data, 16 bits for coefficient). For example, when working
with 8-bit data, `xstep`

needs to be multiples of
four. When working with 8-bit coefficient, `zstep`

needs to be multiples of two. The following two figures show how `step`

works for data and coefficients.

Note that for the coefficient in int8 * int8 types, the 2 * 2 index block is duplicated to construct a 4 * 2 block. See how index 0, 1, 2, and 3 are duplicated in Figure 2.