Aggregate Kernel - 2023.2 English

Vitis Libraries

Release Date
2023-12-20
Version
2023.2 English

The Aggregate kernel is another key kernel of General Query Engine (GQE) which supports both grouping and non-grouping aggregate operations.

GQE Aggregate Kernel

The internal structure of this kernel is shown in the figure above. Same to join kernel, 8-cols data buffer, 1x kernel config buffer and 1x meta info buffer are employed as the kernel input. Due to the diversity of output data types, e.g., aggregate max, min, raw data, etc., 16x output column buffers are used as the output buffer. As shown in above figure, before entering into hash group aggregate module, each element in each row will be evaluated and filtered. Thus, some new elements can be generated and some rows will be discarded. Moreover, two cascaded evaluation modules are added to support more complex expression.

The core module of aggregate kernel is hash group aggregate, which is a multi-PU implementation and given in the following diagram. Each PU requires 2 HBM banks and some URAM memory blocks to buffer distinct keys as well as payloads after aggregate operations. And one internal loop is implemented to consume all input rows with each iteration. Furthermore, all PUs are working in parallel to achieve higher performance.

Detais Diagram of Hash Group Aggregate

The data structure of input and output meta and raw data are same as join kernel. The configuration buffer is composed of 128 x 32-bit slots. The details of configuration buffers are listed in the table:

Module Module Config Width Position
Scan 64 bit config[0]~config[1]
Eval0 289 bit config[2]~config[11]
Eval1 289 bit config[12]~config[21]
Filter 45*32 bit config[22]~config[66]
Shuffle0 64 bit config[67]~config[68]
Shuffle1 64 bit config[69]~config[70]
Shuffle2 64 bit config[71]~config[72]
Shuffle3 64 bit config[73]~config[74]
Group Aggr 4*32 bit config[75]~config[78]
Column Merge 64 bit config[79]~config[80]
Aggregate 1 bit config[81]
Write 16 bit config[82]
Reserved
config[83]~config[127]

The hardware resource utilization of hash group aggregate is shown in the table below (work as 180MHz).

Primitive Quantity LUT LUT as memory LUT as logic Register BRAM36 URAM DSP
Scan 1 12209 4758 7451 18974 0 0 2
Eval 8 2153 426 1727 2042 4 0 21
Filter 4 2168 13 2155 1764 0.5 0 0
Group Aggr 1 162202 27819 134383 210926 62 256 0
Direct Aggr 1 4349 0 4349 6611 0 0 0
Write 1 30938 9490 21448 43579 0 0 0
AXI DDR 1 4586 1313 3273 78855 18 0 0
AXI HBM 1 20528 4456 16072 45416 124 0 0
Total   298470 60402 238068 399737 255 256 2