Arithmetic Logic Unit, Scalar Functions, and Data Type Conversions

Versal Adaptive SoC AI Engine Architecture Manual (AM009)

Document ID
AM009
Release Date
2023-08-18
Revision
1.3 English

The arithmetic logic unit (ALU) in the AI Engine manages the following operations. In all cases the issue rate is one instruction per cycle.

  • Integer addition and subtraction: 32 bits. The operation has a one cycle latency.
  • Bit-wise logical operation on 32-bit integer numbers (BAND, BOR, BXOR). The operation has a one cycle latency.
  • Integer multiplication: 32 x 32 bit with output result of 32 bits stored in the R register file. The operation has a three cycle latency.
  • Shift operation: Both left and right shift are supported. A positive shift amount is used for left shift and a negative shift amount is used for right shift. The shift amount is passed through a general purpose register. A one bit operand to the shift operation indicates whether a positive or negative shift is required. The operation has a one cycle latency.

There are two types of scalar elementary functions in the AI Engine: fixed-point and floating-point. The following describes each function.

  • Fixed-point non-linear functions
    • Sine and cosine:
      • Input is from the upper 20 bits of a 32-bit input
      • Output is a concatenated word with the upper 16-bit sine and the lower 16-bit cosine
      • The operations have a four cycle latency
    • Absolute value (ABS): Invert the input number and add one. The operation has a one cycle latency
    • Count leading zeroes (CLZ): Count leading zeroes in a 32-bit input. The operation has a one cycle latency
    • Minimum/maximum (lesser than (LG)/greater than (GT)): Two inputs are compared to find the minimum or maximum. The operation has a one cycle latency
    • Square Root, Inverse Square Root, and Inverse: These operations are implemented with floating-point precision. For fixed-point implementation the input needs to be first converted to floating-point precision and then passed as input to these non-linear operations. Also, the output is in a floating-point format that needs to be converted back to fixed-point integer format. The operations have a four cycle latency
  • Floating-point non-linear functions
    • Square root: Both input and output are single precision floating-point numbers operating on R register file. The operation has a four cycle latency
    • Inverse square root: Both input and output are single precision floating-point numbers operating on R register file. The operation has a four cycle latency
    • Inverse: Both input and output are single precision floating-point numbers operating on R register file. The operation has a four cycle latency
    • Absolute value (ABS), the operation has a one cycle latency
    • Minimum/maximum, the operation has a one cycle latency

There is no floating-point unit in the scalar unit. The floating-point operations are supported through emulation. In general, it is preferred to perform add and multiply in the vector unit.

The AI Engine scalar unit supports data-type conversion operations to convert input data from a fixed-point to a floating-point and from a floating-point to a fixed-point. The fix2float operation and float2fix operation provide support for variable decimal point where input is a 32-bit value. Along with the input, the decimal point position is also taken as another input. The operations scale the value up or down, if required. Both operations have a one cycle latency.

The AI Engine floating-point is not completely compliant with the IEEE standards and there are restrictions on a set of functionality. The exceptions are outlined in this section.

  • When the float2fix function is called with a very large positive or negative number and the additional exponent increment is greater than zero, the instruction returns 0 instead of the correct saturation value of either 231–1 or –231.

    The float2fix function takes two input parameters:

    • n: The floating-point input value to be converted.
    • sft: A 6-bit signed number representing the number of fractional bits in fixed-point representation (from –32 to 31).

    Consider the two scenarios:

    • If n*2sft > 2129, the output should return 0x7FFFFFFF. Instead, it returns 0x00000000.
    • If n*2sft < –2129, the output should return 0X80000000. Instead, it returns 0x00000000.

    In general, you should ensure that the n floating-point input value stays in the bug-free range of –2(129–sft) < n < 2(129–sft) for sft > 0.

    Two implementations are introduced to provide a workaround:

    • float2fix_safe: This is the default mode if you specify float2fix without any option. The implementation returns the correct value for any range, but is slower.
    • float2fix_fast: This implementation returns the correct value only in the bug-free range and you need to ensure the range is valid. To choose the floatfix_fast implementation, you need to add the preprocessor FLOAT2FIX_FAST to the project file.
  • A fixed-point value has a legal range of –231 to 231–1. When the float2fix function returns a value of –231, the value is within range but an overflow exception is incorrectly set. There is no workaround for this overflow exception.