In all the subsequent intrinsics, the input vector(s) go through a data shuffling function that is controlled by two parameters:
Start
Offset
Let us take the fpmul
function:
vector<float,8> fpmul(vector<float,32> xbuf, int xstart, unsigned int xoffs, vector<float,8> zbuf, int zstart, unsigned int zoffs)
xbuf, xstart, xoffs: first buffer and shuffling parameters
zbuf, zstart, zoffs: second buffer and shuffling parameters
Start: starting offset for all lanes of the buffer
Offset: additional, lane-dependent offset for the buffer. Definition takes 4 bits per lane.
For example:
vector<float,8> ret = fpmul(xbuf,2,0x210FEDCB,zbuf,7,0x76543210)
for (i = 0 ; i < 8 ; i++)
ret[i] = xbuf[xstart + xoffs[i]] * zbuf[zstart + zoffs[i]]
All values in hexadecimal:
ret Index (Lane) |
xbuf Start |
xbuf Offset |
Final xbuf Index |
zbuf Start |
zbuf Offset |
Final zbuf Index |
||
---|---|---|---|---|---|---|---|---|
0 | 2 | B | D | 7 | 0 | 7 | ||
1 | 2 | C | E | 7 | 1 | 8 | ||
2 | 2 | D | F | 7 | 2 | 9 | ||
3 | 2 | E | 10 | 7 | 3 | A | ||
4 | 2 | F | 11 | 7 | 4 | B | ||
5 | 2 | 0 | 2 | 7 | 5 | C | ||
6 | 2 | 1 | 3 | 7 | 6 | D | ||
7 | 2 | 2 | 4 | 7 | 7 | E |