The arithmetic logic unit (ALU) in the AI Engine manages the following operations. In all cases the issue rate is one instruction per cycle.
- Integer addition and subtraction: 32 bits. The operation has a one cycle latency.
- Bit-wise logical operation on 32-bit integer numbers (BAND, BOR, BXOR). The operation has a one cycle latency.
- Integer multiplication: 32 x 32 bit with output result of 32 bits stored in the R register file. The operation has a three cycle latency.
- Shift operation: Both left and right shift are supported. A positive shift amount is used for left shift and a negative shift amount is used for right shift. The shift amount is passed through a general purpose register. A one bit operand to the shift operation indicates whether a positive or negative shift is required. The operation has a one cycle latency.
There are two types of scalar elementary functions in the AI Engine: fixed-point and floating-point. The following describes each function.
- Fixed-point non-linear functions
- Sine and cosine:
- Input is from the upper 20 bits of a 32-bit input
- Output is a concatenated word with the upper 16-bit sine and the lower 16-bit cosine
- The operations have a four cycle latency
- Absolute value (ABS): Invert the input number and add one. The operation has a one cycle latency
- Count leading zeroes (CLZ): Count leading zeroes in a 32-bit input. The operation has a one cycle latency
- Minimum/maximum (lesser than (LG)/greater than (GT)): Two inputs are compared to find the minimum or maximum. The operation has a one cycle latency
- Square Root, Inverse Square Root, and Inverse: These operations are implemented with floating-point precision. For fixed-point implementation the input needs to be first converted to floating-point precision and then passed as input to these non-linear operations. Also, the output is in a floating-point format that needs to be converted back to fixed-point integer format. The operations have a four cycle latency
- Sine and cosine:
- Floating-point non-linear functions
- Square root: Both input and output are single precision floating-point numbers operating on R register file. The operation has a four cycle latency
- Inverse square root: Both input and output are single precision floating-point numbers operating on R register file. The operation has a four cycle latency
- Inverse: Both input and output are single precision floating-point numbers operating on R register file. The operation has a four cycle latency
- Absolute value (ABS), the operation has a one cycle latency
- Minimum/maximum, the operation has a one cycle latency
There is no floating-point unit in the scalar unit. The floating-point operations are supported through emulation. In general, it is preferred to perform add and multiply in the vector unit.
The AI Engine scalar unit
supports data-type conversion operations to convert input data from a fixed-point to a
floating-point and from a floating-point to a fixed-point. The fix2float
operation and float2fix
operation provide support for variable decimal point where input is a 32-bit value.
Along with the input, the decimal point position is also taken as another input. The
operations scale the value up or down, if required. Both operations have a one cycle
latency.
The AI Engine floating-point is not completely compliant with the IEEE standards and there are restrictions on a set of functionality. The exceptions are outlined in this section.
- When the
float2fix
function is called with a very large positive or negative number and the additional exponent increment is greater than zero, the instruction returns 0 instead of the correct saturation value of either 231–1 or –231.The
float2fix
function takes two input parameters:- n: The floating-point input value to be converted.
- sft: A 6-bit signed number representing the number of fractional bits in fixed-point representation (from –32 to 31).
Consider the two scenarios:
- If n*2sft > 2129, the output should return
0x7FFFFFFF
. Instead, it returns0x00000000
. - If n*2sft < –2129, the output should return
0X80000000
. Instead, it returns0x00000000
.
In general, you should ensure that the
n
floating-point input value stays in the bug-free range of –2(129–sft) < n < 2(129–sft) for sft > 0.Two implementations are introduced to provide a workaround:
-
float2fix_safe
: This is the default mode if you specifyfloat2fix
without any option. The implementation returns the correct value for any range, but is slower. -
float2fix_fast
: This implementation returns the correct value only in the bug-free range and you need to ensure the range is valid. To choose thefloatfix_fast
implementation, you need to add the preprocessor FLOAT2FIX_FAST to the project file.
- A fixed-point value has a legal range of –231 to
231–1. When the
float2fix
function returns a value of –231, the value is within range but an overflow exception is incorrectly set. There is no workaround for this overflow exception.