GeMM DSP RTL design can be divided into 2 main parts - First one is the core matrix mutliplication functionality, gemm_top module is the top level module that implements this functionality. Second part involves data mover logic for writing Matrix A and B data and to read the matrix output from host application. This is implemented in ps_slave module.
In this design, core DSP logic operates at 750MHz while rest of the logic operates at 375 MHz. There is synchronizer module to handle synchronization of signals going across these 2 clock domains
gemm_large_ocm
|-gemm_top
|-ps_slave
|-synchronizer
Underneath gemm_top module, following modules are instantiated -
FIXGEMM_WRAPPER - This module implements the systolic array of 1K DSP58 Engines
row_uram - These are the URAMs which store Matrix A data. Entire 1Kx1K matrix A is stored in URAMs
col_uram - These are the URAMs which store Matrix B data. Entire 1Kx1K matrix B is stored in URAMs
partial_sum_bram - There are 64 partial Sum BRAMs (512 x 64) to store the partial sum
op_uram - These URAMs store the final output of the matrix multiplication
DSP_data_controller - This module controls input data to DSP58 array and output from DSP58 array
control_logic - This module controls writes/reads to/from URAMs
Underneath FIXGEMM_WRAPPER, FIXGEMM entity is instantiated, underneath which there is DSP_GW instantiations.