An example of MAC with int16
buffer and int16
Z buffer is as follows. Note that
the permute granularity for
X buffer is 32 bits.
parameters are always in terms of data type granularity. To get to a 16-bit index,
you need to multiply them by 2.
xoffsets parameter comes as a
pair. The first hex value is an absolute 32 bits offset and picks up 2 x 16 bits
values (index, index+1) in the even row. The second hex value is offset from first
value + 1 (32 bits offset) and picks up 2 x 16 bits values in the odd row. So the
xoffsets selects index 8, 9 for even row and index 14, 15 for odd row
even: 2 * 4 -> get indices [8, 9] odd: 2 * ( 2 + 4 + 1 ) -> get indices [14, 15]
Similarly, the hex value
xoffsets selects index 0, 1 for even row and
index 2, 3 for odd row from
There is another
to perform 16 bits granularity twiddling after the main permute. It will give
additional contribution to the index in a 2 by 2 matrix recurring across the 8x4
matrix compute given by MUL8 in int16 x int16 mode.
0x2103 (see from lower hex value to higher hex value)
puts index 3, 0 in the even row and index 1, 2 in the odd row. How the
xsquare parameter works can be seen in the center of
the following figure.
The following figure is an example of
mac16 intrinsic of int16 and int16.