|
Multi-core Hash Joins
Main-memory hash join implementations for multi-core CPUs
|
Modules | |
| Optimized Partitioning Using SW-buffers | |
Functions | |
| int64_t | bucket_chaining_join (const relation_t *const R, const relation_t *const S, relation_t *const tmpR, void *output) |
| uint32_t | get_hist_size (uint32_t relSize) __attribute__((always_inline)) |
| int64_t | histogram_join (const relation_t *const R, const relation_t *const S, relation_t *const tmpR, void *output) |
| void | prefetch (void *addr) __attribute__((always_inline)) |
| int64_t | histogram_optimized_join (const relation_t *const R, const relation_t *const S, relation_t *const tmpR, void *output) |
| void | radix_cluster (relation_t *restrict outRel, relation_t *restrict inRel, int32_t *restrict hist, int R, int D) |
| void | radix_cluster_nopadding (relation_t *outRel, relation_t *inRel, int R, int D) |
| void | serial_radix_partition (task_t *const task, task_queue_t *join_queue, const int R, const int D) |
| void | parallel_radix_partition (part_t *const part) |
| void * | prj_thread (void *param) |
| result_t * | join_init_run (relation_t *relR, relation_t *relS, JoinFunction jf, int nthreads) |
| result_t * | PRO (relation_t *relR, relation_t *relS, int nthreads) |
| result_t * | PRH (relation_t *relR, relation_t *relS, int nthreads) |
| result_t * | PRHO (relation_t *relR, relation_t *relS, int nthreads) |
| result_t * | RJ (relation_t *relR, relation_t *relS, int nthreads) |
| int64_t bucket_chaining_join | ( | const relation_t *const | R, |
| const relation_t *const | S, | ||
| relation_t *const | tmpR, | ||
| void * | output | ||
| ) |
This algorithm builds the hashtable using the bucket chaining idea and used in PRO implementation. Join between given two relations is evaluated using the "bucket chaining" algorithm proposed by Manegold et al. It is used after the partitioning phase, which is common for all algorithms. Moreover, R and S typically fit into L2 or at least R and |R|*sizeof(int) fits into L2 cache.
| R | input relation R |
| S | input relation S |
| output | join results, if JOIN_RESULT_MATERIALIZE defined. |
Definition at line 243 of file parallel_radix_join.c.
|
inline |
computes and returns the histogram size for join
Definition at line 316 of file parallel_radix_join.c.
| int64_t histogram_join | ( | const relation_t *const | R, |
| const relation_t *const | S, | ||
| relation_t *const | tmpR, | ||
| void * | output | ||
| ) |
Histogram-based hash table build method together with relation re-ordering as described by Kim et al. It joins partitions Ri, Si of relations R & S. This is version is not optimized with SIMD and prefetching. The parallel radix join implementation using this function is PRH.
Definition at line 331 of file parallel_radix_join.c.
| int64_t histogram_optimized_join | ( | const relation_t *const | R, |
| const relation_t *const | S, | ||
| relation_t *const | tmpR, | ||
| void * | output | ||
| ) |
Histogram-based hash table build method together with relation re-ordering as described by Kim et al. It joins partitions Ri, Si of relations R & S. This is version includes SIMD and prefetching optimizations as described by Kim et al. The parallel radix join implementation using this function is PRHO. Note: works only for 32-bit keys.
Definition at line 419 of file parallel_radix_join.c.
| result_t * join_init_run | ( | relation_t * | relR, |
| relation_t * | relS, | ||
| JoinFunction | jf, | ||
| int | nthreads | ||
| ) |
The template function for different joins: Basically each parallel radix join has a initialization step, partitioning step and build-probe steps. All our parallel radix implementations have exactly the same initialization and partitioning steps. Difference is only in the build-probe step. Here are all the parallel radix join implemetations and their Join (build-probe) functions:
Not an elegant way of passing whether we will numa-localize, but this feature is experimental anyway.
Definition at line 1424 of file parallel_radix_join.c.
| void parallel_radix_partition | ( | part_t *const | part | ) |
This function implements the parallel radix partitioning of a given input relation. Parallel partitioning is done by histogram-based relation re-ordering as described by Kim et al. Parallel partitioning method is commonly used by all parallel radix join algorithms.
| part | description of the relation to be partitioned |
Definition at line 715 of file parallel_radix_join.c.
|
inline |
software prefetching function
Definition at line 403 of file parallel_radix_join.c.
| result_t * PRH | ( | relation_t * | relR, |
| relation_t * | relS, | ||
| int | nthreads | ||
| ) |
PRH: Parallel Radix Join Histogram-based.
The "Parallel Radix Join Histogram-based" implementation denoted as PRH implements the parallel radix join idea by Kim et al. without SIMD and prefetching optimizations. The difference from PRO is that the build phase is based on the histogram-based relation re-ordering idea.
| relR | input relation R - inner relation |
| relS | input relation S - inner relation |
Definition at line 1620 of file parallel_radix_join.c.
| result_t * PRHO | ( | relation_t * | relR, |
| relation_t * | relS, | ||
| int | nthreads | ||
| ) |
PRHO: Parallel Radix Join Histogram-based Optimized.
The "Parallel Radix Join Histogram-based Optimized" implementation denoted as PRHO implements the parallel radix join idea by Kim et al. with SIMD and prefetching optimizations. The difference from PRH is that the probe phase uses SIMD and software prefetching optimizations.
| relR | input relation R - inner relation |
| relS | input relation S - inner relation |
Definition at line 1627 of file parallel_radix_join.c.
| void * prj_thread | ( | void * | param | ) |
The main thread of parallel radix join. It does partitioning in parallel with other threads and during the join phase, picks up join tasks from the task queue and calls appropriate JoinFunction to compute the join task.
| param |
Definition at line 964 of file parallel_radix_join.c.
| result_t * PRO | ( | relation_t * | relR, |
| relation_t * | relS, | ||
| int | nthreads | ||
| ) |
PRO: Parallel Radix Join Optimized.
The "Parallel Radix Join Optimized" implementation denoted as PRO implements the parallel radix join idea by Kim et al. with further optimizations. Mainly it uses the bucket chaining for the build phase instead of the histogram-based relation re-ordering and performs better than other variations such as PRHO, which applies SIMD and prefetching optimizations.
| relR | input relation R - inner relation |
| relS | input relation S - inner relation |
Definition at line 1613 of file parallel_radix_join.c.
| void radix_cluster | ( | relation_t *restrict | outRel, |
| relation_t *restrict | inRel, | ||
| int32_t *restrict | hist, | ||
| int | R, | ||
| int | D | ||
| ) |
Radix clustering algorithm (originally described by Manegold et al) The algorithm mimics the 2-pass radix clustering algorithm from Kim et al. The difference is that it does not compute prefix-sum, instead the sum (offset in the code) is computed iteratively.
| outRel | [out] result of the partitioning |
| inRel | [in] input relation |
| hist | [out] number of tuples in each partition |
| R | cluster bits |
| D | radix bits per pass |
Definition at line 545 of file parallel_radix_join.c.
| void radix_cluster_nopadding | ( | relation_t * | outRel, |
| relation_t * | inRel, | ||
| int | R, | ||
| int | D | ||
| ) |
Radix clustering algorithm which does not put padding in between clusters. This is used only by single threaded radix join implementation RJ.
| outRel | |
| inRel | |
| hist | |
| R | |
| D |
Definition at line 595 of file parallel_radix_join.c.
| result_t * RJ | ( | relation_t * | relR, |
| relation_t * | relS, | ||
| int | nthreads | ||
| ) |
RJ: Radix Join.
The "Radix Join" implementation denoted as RJ implements the single-threaded original multipass radix cluster join idea by Manegold et al.
| relR | input relation R - inner relation |
| relS | input relation S - inner relation |
Definition at line 1634 of file parallel_radix_join.c.
| void serial_radix_partition | ( | task_t *const | task, |
| task_queue_t * | join_queue, | ||
| const int | R, | ||
| const int | D | ||
| ) |
This function implements the radix clustering of a given input relations. The relations to be clustered are defined in task_t and after clustering, each partition pair is added to the join_queue to be joined.
| task | description of the relation to be partitioned |
| join_queue | task queue to add join tasks after clustering |
Definition at line 658 of file parallel_radix_join.c.