2. The functionality of the CUs - 2023.2 English

Vitis Libraries

Release Date
2023-12-20
Version
2023.2 English
  • The loadCol CU reads the input dense column vector and the NNZ column pointer entries from two physically separated DDR device memories DDR0 and DDR1 as shown in the figure above, and send them to the bufTransColVec and bufTransNnzCol CUs to buffer and select entries for each computation path connected to each HBM channel.
  • The bufTransColVec CU reads the input dense vector entries that belong to each block, split them into chuncks for each HBM channel, buffer all those chunks (16 in total in this design) and transmit the data to its corresponding xBarCol CU.
  • The bufTransNnzCol CU reads the column pointer entries that belong to each block, split them into chuncks for each HBM channels, buffer all those chunks (16 in total in this design) and transmit the data to its corresponding xBarCol CU.
  • The xBarCol CUs, one for each HBM channel, select the input dense vector entries according to the NNZs’ column pointer entries and send the result to cscRow CUs for computations.
  • Each cscRow CU reads the value and row indices of NNZs from one HBM channel and multiplies the values with their corresponding column entries received from the connected xBarCol CU, and accumulates the results along the row indices.
  • Each readWriteHbm CU connects to 8 HBM channels, and reads the NNZs’ value and row indices from those connected HBM channels and send the results to the corresponding cscRow CUs. It also collects the results from 8 cscRow CUs and writes them back to the corresponding HBM channels. In total, 2 readWriteHbm CUs are used to connect to 16 HBM channels.