# Data Permute and MAC Examples - 2022.2 English

## AI Engine Kernel and Graph Programming Guide (UG1079)

Document ID
UG1079
Release Date
2022-10-19
Version
2022.2 English

The following example takes two vectors with reals in `rva` and imaginary in `rvb` (with type `v8int32`) and creates a new complex vector, using the offsets to interleave the values as required.

``````v8cint32 cv = as_v8cint32(select16(0xaaaa, concat(rva, rvb),
0, 0x03020100, 0x07060504,  8, 0x30201000, 0x70605040));
``````

The following example shows how to extract real and imaginary portion of a vector `cv` with type `v8cint32`.

``````v16int32 re_im  = shuffle16(as_v16int32(cv), 0, 0xECA86420, 0xFDB97531);
v8int32 re = ext_w(re_im, 0);
v8int32 im = ext_w(re_im, 1);
``````

Shuffle intrinsic functions can be used to reorder the elements in a vector or set all elements to the same value. Some intrinsic functions operate only on larger registers but it is easy to use them for smaller registers. The following example shows how to implement a function to set all four elements in a vector to a constant value.

``v4int32 v2 = ext_v(shuffle16(xset_v(0, v1), 0 ,0, 0), 0);``

The following example shows how to multiply each element in `rva` by the first element in `rvb`. This is efficient for a vector multiplied by constant value.

``v8acc80 acc = lmul8(concat(rva,undef_v8int32()),0,0x76543210,rvb,0,0x00);``

The following examples show how to multiply each element in `rva` by its corresponding element in `rvb`.

``````acc = lmul8(concat(rva, undef_v8int32()),0,0x76543210,rvb,0,0x76543210);
acc = lmul8(upd_w(undef_v16int32(),0,rva),0,0x76543210,rvb,0,0x76543210);
``````

The following examples show how to do matrix multiplication for int8 x int8 data types with `mul` intrinsic, assuming that data storage is row based.

``````//Z_{2x8} * X_{8x8} = A_{2x8}
mul16(Xbuff, 0, 0x11101110, 16, 0x3120, Zbuff, 0, 0x44440000, 2, 0x3210);
//Z_{4x8} * X_{8x4} = A_{4x4}
mul16(Xbuff, 0, 0x00000000, 8, 0x3120, Zbuff, 0, 0xCC884400, 2, 0x3210);
``````

If the kernel has multiple `mul` or `mac` intrinsics, try to keep the `xoffsets` and `zoffsets` parameters constant across uses and vary the `xtsart` and `zstart` parameters. This will help prevent configuration register spills on stack.

For more information about vector lane permutations, refer to the Versal ACAP AI Engine Intrinsics Documentation (UG1078).