Joiner Design - 2023.2 English

Vitis Libraries

Release Date
2023-12-20
Version
2023.2 English

Join has two phases: build and probe. When the build table is fixed, the probe efficiency is linear to probe table size. However, when the build table gets larger, the performance of the probe decreases. In real scenarios, the size of both build and probe table is unknown. Therefore, to support different sized two tables’ join with efficient performance, two solutions are provided in L3:

  • solution 0/1: Direct Join (1x Build + Nx Probe), which is used when the build table is not too large.
  • solution 2: Hash Partition + Hash Join, partitioning left table and right table to M partitions. Thereafter, Perform Build + Nx Probe for each partition pair.

In solution 1, the size of the build table is relatively small, so the probe phase is efficient. To minimize the overhead of data transfer and CPU data movement, the probe table is splitted into multi-section horizontally.

When the build table is large, partitioning the input tables into multi-partitions may keep the high-performance execution of build+probe. Solution 2 includes three sequential phases: Partitioning build table O, partitioning probe table L, and do multiple times of solution 1. For each phase, pipelined task scheduling is employed.

Note

Comparing these two type of solutions, although partitioning table O and L introduces extra overhead of partitioning, the build+probe time is decreased due to the dramatically reduced unique key radio.

Caution

With TPC-H Q5s query as the example, the solution 1 and 2 switching data size for build table is ~ SF8 (confirm?). Which is to say, when the scale factor is smaller than SF8, solution 1 is faster. For datasets larger than SF8, solution 2 would be faster.