PL Kernel Details - 2023.2 English

Vitis Tutorials: AI Engine

Document ID
XD100
Release Date
2023-11-29
Version
2023.2 English

GeMM DSP RTL design can be divided into 2 main parts - First one is the core matrix mutliplication functionality, gemm_top module is the top level module that implements this functionality. Second part involves data mover logic for writing Matrix A and B data and to read the matrix output from host application. This is implemented in ps_slave module.

In this design, core DSP logic operates at 750MHz while rest of the logic operates at 375 MHz. There is synchronizer module to handle synchronization of signals going across these 2 clock domains

gemm_large_ocm
|-gemm_top
|-ps_slave
|-synchronizer

Underneath gemm_top module, following modules are instantiated -

  1. FIXGEMM_WRAPPER - This module implements the systolic array of 1K DSP58 Engines

  2. row_uram - These are the URAMs which store Matrix A data. Entire 1Kx1K matrix A is stored in URAMs

  3. col_uram - These are the URAMs which store Matrix B data. Entire 1Kx1K matrix B is stored in URAMs

  4. partial_sum_bram - There are 64 partial Sum BRAMs (512 x 64) to store the partial sum

  5. op_uram - These URAMs store the final output of the matrix multiplication

  6. DSP_data_controller - This module controls input data to DSP58 array and output from DSP58 array

  7. control_logic - This module controls writes/reads to/from URAMs

Underneath FIXGEMM_WRAPPER, FIXGEMM entity is instantiated, underneath which there is DSP_GW instantiations.

Platform Details