Hardware Implementation - 2023.2 English

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID

XD099

Release Date

2023-11-13

Version

2023.2 English

To understand what kind of hardware implementation is needed given the performance constraints, you can examine the convolution kernel in some detail:

The core compute is done in a four-level nested loop, but you can break it to the compute per output pixel produced.
In terms of the output-pixels produced, it is clear from the filter source code that a single output pixel is produced when the inner two loops finish execution once.
These two loops are essentially doing the sum-of-product on a coefficient matrix and image sub-matrix. The matrix sizes are defined by the coefficient matrix, which is 15x15.
The inner two loops are performing a dot product of size 225(15x15). In other words, the two inner loops perform 225 multiply-accumulate (MAC) operations for every output pixel produced.