The following figure shows the block diagram of the bfloat16 (BF) vector datapath. It shares the 8 x 8 multipliers with half of the integer datapath along with additional blocks for floating point exponent compute and mantissa shifting and normalization.

To reduce the accumulation feedback loop, multiple accumulator registers are used to allow back-to-back floating-point MAC instructions.

The BF and FP mantissa shift unit shifts down each of the 128 multiplier lanes and the 2 x 16 accumulator lanes. The accumulator unit supports addition/subtract/negate of accumulator registers in a single-precision FP32 format. The FP normalization unit handles the cases where the mantissa coming from the post-adder is negative and if the mantissa is outside the acceptable range.

The AIE-ML supports several vector element-wise functions for the bfloat16 format. These functions include a vector comparison, minimum, and maximum. They operate in an element-wise fashion comparing two vectors. The separate fixed-point vector add/compare unit is extended to handle the floating-point elementary function.

The floating-point unit can issue events that correspond to standard floating-point exceptions and the status registers keep track of the events. There are eight exception bits per floating-point functional unit. The exceptions are (from bit 0 to 7): zero, infinity, tiny (underflow), huge (overflow), inexact, huge integer, and divide-by-zero. Of the eight exceptions, tiny, huge, invalid, and divide-by-zero can be converted into an event that can be broadcast to the AIE-ML array interface and then sent to the PS/PMC as an interrupt.

Denormalized numbers are not supported by the AIE-ML floating-point data path.