Preliminaries - 2023.2 English

Vitis Tutorials: AI Engine (XD100)

Document ID

XD100

Release Date

2024-03-05

Version

2023.2 English

In Part 2a, we examined the generated assembler code and found a NOP (no operation) between the VFPMAC (vector floating-point multiply-accumulate) mnemonics. This NOP is unavoidable as a floating-point accumulation requires two cycles (see Fig. 26 of AM009).

We can split the matrix-vector multiplication into two separate multiply-accumulate operations to perform a floating-point accumulation on each cycle.

Note: Instead of the “traditional” method of multiplying each row of the matrix by the column vector, we effectively scale each column of the matrix by the corresponding element in the vector with the multiply-accumulate API.

Fig. 1

Thus, splitting the vector additions into even and odd parts allow us to perform independent multiply-accumulate operations:

Fig. 2

Also, the AI Engine has two load units. The Julia program aie_iir_2b.jl is modified to split the matrix into even and odd columns and generate two separate header files.

We start by using the AI Engine APIs.