The arithmetic logic unit (ALU) in the AI Engine manages the following operations. In all cases the issue rate is one instruction per cycle.
 Integer addition and subtraction: 32 bits. The operation has a one cycle latency.
 Bitwise logical operation on 32bit integer numbers (BAND, BOR, BXOR). The operation has a one cycle latency.
 Integer multiplication: 32 x 32 bit with output result of 32 bits stored in the R register file. The operation has a three cycle latency.
 Shift operation: Both left and right shift are supported. A positive shift amount is used for left shift and a negative shift amount is used for right shift. The shift amount is passed through a general purpose register. A one bit operand to the shift operation indicates whether a positive or negative shift is required. The operation has a one cycle latency.
There are two types of scalar elementary functions in the AI Engine: fixedpoint and floatingpoint. The following describes each function.
 Fixedpoint nonlinear functions
 Sine and cosine:
 Input is from the upper 20 bits of a 32bit input
 Output is a concatenated word with the upper 16bit sine and the lower 16bit cosine
 The operations have a four cycle latency
 Absolute value (ABS): Invert the input number and add one. The operation has a one cycle latency
 Count leading zeroes (CLZ): Count leading zeroes in a 32bit input. The operation has a one cycle latency
 Minimum/maximum (lesser than (LG)/greater than (GT)): Two inputs are compared to find the minimum or maximum. The operation has a one cycle latency
 Square Root, Inverse Square Root, and Inverse: These operations are implemented with floatingpoint precision. For fixedpoint implementation the input needs to be first converted to floatingpoint precision and then passed as input to these nonlinear operations. Also, the output is in a floatingpoint format that needs to be converted back to fixedpoint integer format. The operations have a four cycle latency
 Sine and cosine:
 Floatingpoint nonlinear functions
 Square root: Both input and output are single precision floatingpoint numbers operating on R register file. The operation has a four cycle latency
 Inverse square root: Both input and output are single precision floatingpoint numbers operating on R register file. The operation has a four cycle latency
 Inverse: Both input and output are single precision floatingpoint numbers operating on R register file. The operation has a four cycle latency
 Absolute value (ABS), the operation has a one cycle latency
 Minimum/maximum, the operation has a one cycle latency
There is no floatingpoint unit in the scalar unit. The floatingpoint operations are supported through emulation. In general, it is preferred to perform add and multiply in the vector unit.
The AI Engine scalar unit
supports datatype conversion operations to convert input data from a fixedpoint to a
floatingpoint and from a floatingpoint to a fixedpoint. The fix2float
operation and float2fix
operation provide support for variable decimal point where input is a 32bit value.
Along with the input, the decimal point position is also taken as another input. The
operations scale the value up or down, if required. Both operations have a one cycle
latency.
The AI Engine floatingpoint is not completely compliant with the IEEE standards and there are restrictions on a set of functionality. The exceptions are outlined in this section.
 When the
float2fix
function is called with a very large positive or negative number and the additional exponent increment is greater than zero, the instruction returns 0 instead of the correct saturation value of either 2^{31}–1 or –2^{31}.The
float2fix
function takes two input parameters: n: The floatingpoint input value to be converted.
 sft: A 6bit signed number representing the number of fractional bits in fixedpoint representation (from –32 to 31).
Consider the two scenarios:
 If n*2^{sft} > 2^{129}, the output should return
0x7FFFFFFF
. Instead, it returns0x00000000
.  If n*2^{sft} < –2^{129}, the output should return
0X80000000
. Instead, it returns0x00000000
.
In general, you should ensure that the
n
floatingpoint input value stays in the bugfree range of –2^{(129–sft)} < n < 2^{(129–sft)} for sft > 0.Two implementations are introduced to provide a workaround:

float2fix_safe
: This is the default mode if you specifyfloat2fix
without any option. The implementation returns the correct value for any range, but is slower. 
float2fix_fast
: This implementation returns the correct value only in the bugfree range and you need to ensure the range is valid. To choose thefloatfix_fast
implementation, you need to add the preprocessor FLOAT2FIX_FAST to the project file.
 A fixedpoint value has a legal range of –2^{31} to
2^{31}–1. When the
float2fix
function returns a value of –2^{31}, the value is within range but an overflow exception is incorrectly set. There is no workaround for this overflow exception.