1. Matrix partitioning and device memory layout - 2023.2 English

Vitis Libraries

Release Date
2023-12-20
Version
2023.2 English

As illustrated in the figure below, the matrix partitioning steps implemnted in the software are:

cscmv Diagram
  • Partition the entire matrix into blocks according to the on-chip row and column buffer sizes, shown as “on-chip row buffer size” and “on-chip col buffer size” in the figure. The “on-chip col buffer size” and the “on-chip row buffer size” can be defined at hardware compile time by macro SPARSE_maxColMemBlocks and SPARSE_maxRowBlocks. For the Alveo U280 card, the following fomula shows how to compute the number of rows and columns in each on-chip matrix block.
number of columns in each block = SPARSE_maxColMemBlocks * 16
number of rows in each block = SPARSE_maxRowBlocks * 4
  • Partition each block evenly into chunks along the column. The number of chuncks are decided at hardeware compile time by macro SPARSE_hbmChannels. In this design 16 HBM channels are used.
  • According to their HBM channel ID, these data chunks are aseembled into different host memory regions, which will be migrated to different HBM channels on the device during runtime. For example, as shown in the figure above, the red data chunks in each block will be assembled into one memory block and migrated to HBM channel 0 on the device.

The matrix block partition information is stored in the DDR and HBM channels. The loadCol and readWriteHbm CUs will decode this information and retrieve the data correspondingly. As shown in the figure above, there are following three sections in each device memory.

  • Parameter summary section. This section is used to store number of parameter descriptions. The size (number of bytes) of this section is defined by macro SPARSE_paramOffset, which is 1024 in the figure above.

  • Parameter section. This section is used to store the parameter descriptions of data blocks. Each parameter description normally includes the address offset, the number of parallelly processed matrix/vector entries, the min/max indices in the blocks and etc.

  • Data section. This section is used to store matrix and vector data. The data inforamtion of DDR and HBM device memories are given below.

    • DDR0: dense input vector data. Each DDR access produces 16 FP32 data entries.
    • DDR1: column pointers of the NNZs in a sparse matrix. Each DDR access produces 16 column pointer values for 16 NNZs.
    • HBM channels: row indices and values of the NNZs in a sparse matrix. Each access of one single HBM channels produces 4 values and 4 row indices data for 4 NNZs.