Data Permute and MAC Examples - 2021.2 English

AI Engine Kernel Coding Best Practices Guide (UG1079)

Document ID
Release Date
2021.2 English

The following example takes two vectors with reals in rva and imaginary in rvb (with type v8int32) and creates a new complex vector, using the offsets to interleave the values as required.

v8cint32 cv = as_v8cint32(select16(0xaaaa, concat(rva, rvb),
		0, 0x03020100, 0x07060504,  8, 0x30201000, 0x70605040));

The following example shows how to extract real and imaginary portion of a vector cv with type v8cint32.

v16int32 re_im  = shuffle16(as_v16int32(cv), 0, 0xECA86420, 0xFDB97531);  
v8int32 re = ext_w(re_im, 0);  
v8int32 im = ext_w(re_im, 1);

Shuffle intrinsic functions can be used to reorder the elements in a vector or set all elements to the same value. Some intrinsic functions operate only on larger registers but it is easy to use them for smaller registers. The following example shows how to implement a function to set all four elements in a vector to a constant value.

v4int32 v2 = ext_v(shuffle16(xset_v(0, v1), 0 ,0, 0), 0);

The following example shows how to multiply each element in rva by the first element in rvb. This is efficient for a vector multiplied by constant value.

v8acc80 acc = lmul8(concat(rva,undef_v8int32()),0,0x76543210,rvb,0,0x00);

The following examples show how to multiply each element in rva by its corresponding element in rvb.

acc = lmul8(concat(rva, undef_v8int32()),0,0x76543210,rvb,0,0x76543210);
acc = lmul8(upd_w(undef_v16int32(),0,rva),0,0x76543210,rvb,0,0x76543210);

The following examples show how to do matrix multiplication for int8 x int8 data types with mul intrinsic, assuming that data storage is row based.

//Z_{2x8} * X_{8x8} = A_{2x8}
mul16(Xbuff, 0, 0x11101110, 16, 0x3120, Zbuff, 0, 0x44440000, 2, 0x3210);
//Z_{4x8} * X_{8x4} = A_{4x4}
mul16(Xbuff, 0, 0x00000000, 8, 0x3120, Zbuff, 0, 0xCC884400, 2, 0x3210);

If the kernel has multiple mul or mac intrinsics, try to keep the xoffsets and zoffsets parameters constant across uses and vary the xtsart and zstart parameters. This will help prevent configuration register spills on stack.

For more information about vector lane permutations, refer to the Versal ACAP AI Engine Intrinsics Documentation (UG1078).