This tutorial is based on a C++ kernel that you will optimize for the highest throughput.
The algorithm is a common linear algebra solver, the decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose. For this purpose, you will use the Cholesky decomposition or Cholesky factorization (pronounced /ʃo-LESS-key/). This solver is useful for several numerical problems, in particular for Monte Carlo simulations.
This algorithm has a serial complexity O(n3).
More information on wikipedia… Note that this solver is included as part the official Vitis accelerated libraries; here is a link to its documentation.
For your purposes, you will start with a simple description implemented in C++ and explain how to adapt it for acceleration with an AMD Alveo™ U50 card.