Internally comprises four loops (mm2s0, s2mm0 , mm2s1, and s2mm1). s2mm0 and mm2s1 are sequenced one after the other and wrapped into the dmaHls_rowsToCols. dmaHls_rowsToCols and s2mm1 are concurrently scheduled.
The data width is 128 bits at both the AXI4-Stream I/O side.