The PL-based data mover consists of the
dma_hls kernel, which generates constant Inputs for Mat A and B and checks the output of GeMM graph for the expected constant pattern.
It internally comprises four loops (
out_C), with all concurrently scheduled.
The data width is 128 bits at both the AXI4-stream I/O sides, running at 312.5 MHz.