Multi-core Hash Joins
Main-memory hash join implementations for multi-core CPUs
Modules | Functions
Radix Join Implementation Variants

Modules

 Optimized Partitioning Using SW-buffers
 

Functions

int64_t bucket_chaining_join (const relation_t *const R, const relation_t *const S, relation_t *const tmpR, void *output)
 
uint32_t get_hist_size (uint32_t relSize) __attribute__((always_inline))
 
int64_t histogram_join (const relation_t *const R, const relation_t *const S, relation_t *const tmpR, void *output)
 
void prefetch (void *addr) __attribute__((always_inline))
 
int64_t histogram_optimized_join (const relation_t *const R, const relation_t *const S, relation_t *const tmpR, void *output)
 
void radix_cluster (relation_t *restrict outRel, relation_t *restrict inRel, int32_t *restrict hist, int R, int D)
 
void radix_cluster_nopadding (relation_t *outRel, relation_t *inRel, int R, int D)
 
void serial_radix_partition (task_t *const task, task_queue_t *join_queue, const int R, const int D)
 
void parallel_radix_partition (part_t *const part)
 
void * prj_thread (void *param)
 
result_tjoin_init_run (relation_t *relR, relation_t *relS, JoinFunction jf, int nthreads)
 
result_tPRO (relation_t *relR, relation_t *relS, int nthreads)
 
result_tPRH (relation_t *relR, relation_t *relS, int nthreads)
 
result_tPRHO (relation_t *relR, relation_t *relS, int nthreads)
 
result_tRJ (relation_t *relR, relation_t *relS, int nthreads)
 

Detailed Description

Function Documentation

◆ bucket_chaining_join()

int64_t bucket_chaining_join ( const relation_t *const  R,
const relation_t *const  S,
relation_t *const  tmpR,
void *  output 
)

This algorithm builds the hashtable using the bucket chaining idea and used in PRO implementation. Join between given two relations is evaluated using the "bucket chaining" algorithm proposed by Manegold et al. It is used after the partitioning phase, which is common for all algorithms. Moreover, R and S typically fit into L2 or at least R and |R|*sizeof(int) fits into L2 cache.

Parameters
Rinput relation R
Sinput relation S
outputjoin results, if JOIN_RESULT_MATERIALIZE defined.
Returns
number of result tuples

Definition at line 243 of file parallel_radix_join.c.

◆ get_hist_size()

uint32_t get_hist_size ( uint32_t  relSize)
inline

computes and returns the histogram size for join

Definition at line 316 of file parallel_radix_join.c.

◆ histogram_join()

int64_t histogram_join ( const relation_t *const  R,
const relation_t *const  S,
relation_t *const  tmpR,
void *  output 
)

Histogram-based hash table build method together with relation re-ordering as described by Kim et al. It joins partitions Ri, Si of relations R & S. This is version is not optimized with SIMD and prefetching. The parallel radix join implementation using this function is PRH.

Definition at line 331 of file parallel_radix_join.c.

◆ histogram_optimized_join()

int64_t histogram_optimized_join ( const relation_t *const  R,
const relation_t *const  S,
relation_t *const  tmpR,
void *  output 
)

Histogram-based hash table build method together with relation re-ordering as described by Kim et al. It joins partitions Ri, Si of relations R & S. This is version includes SIMD and prefetching optimizations as described by Kim et al. The parallel radix join implementation using this function is PRHO. Note: works only for 32-bit keys.

Definition at line 419 of file parallel_radix_join.c.

◆ join_init_run()

result_t * join_init_run ( relation_t relR,
relation_t relS,
JoinFunction  jf,
int  nthreads 
)

The template function for different joins: Basically each parallel radix join has a initialization step, partitioning step and build-probe steps. All our parallel radix implementations have exactly the same initialization and partitioning steps. Difference is only in the build-probe step. Here are all the parallel radix join implemetations and their Join (build-probe) functions:

Not an elegant way of passing whether we will numa-localize, but this feature is experimental anyway.

Definition at line 1424 of file parallel_radix_join.c.

◆ parallel_radix_partition()

void parallel_radix_partition ( part_t *const  part)

This function implements the parallel radix partitioning of a given input relation. Parallel partitioning is done by histogram-based relation re-ordering as described by Kim et al. Parallel partitioning method is commonly used by all parallel radix join algorithms.

Parameters
partdescription of the relation to be partitioned

Definition at line 715 of file parallel_radix_join.c.

◆ prefetch()

void prefetch ( void *  addr)
inline

software prefetching function

Definition at line 403 of file parallel_radix_join.c.

◆ PRH()

result_t * PRH ( relation_t relR,
relation_t relS,
int  nthreads 
)

PRH: Parallel Radix Join Histogram-based.

The "Parallel Radix Join Histogram-based" implementation denoted as PRH implements the parallel radix join idea by Kim et al. without SIMD and prefetching optimizations. The difference from PRO is that the build phase is based on the histogram-based relation re-ordering idea.

Parameters
relRinput relation R - inner relation
relSinput relation S - inner relation
Returns
number of result tuples

Definition at line 1620 of file parallel_radix_join.c.

◆ PRHO()

result_t * PRHO ( relation_t relR,
relation_t relS,
int  nthreads 
)

PRHO: Parallel Radix Join Histogram-based Optimized.

The "Parallel Radix Join Histogram-based Optimized" implementation denoted as PRHO implements the parallel radix join idea by Kim et al. with SIMD and prefetching optimizations. The difference from PRH is that the probe phase uses SIMD and software prefetching optimizations.

Parameters
relRinput relation R - inner relation
relSinput relation S - inner relation
Returns
number of result tuples

Definition at line 1627 of file parallel_radix_join.c.

◆ prj_thread()

void * prj_thread ( void *  param)

The main thread of parallel radix join. It does partitioning in parallel with other threads and during the join phase, picks up join tasks from the task queue and calls appropriate JoinFunction to compute the join task.

Parameters
param
Returns

Definition at line 964 of file parallel_radix_join.c.

◆ PRO()

result_t * PRO ( relation_t relR,
relation_t relS,
int  nthreads 
)

PRO: Parallel Radix Join Optimized.

The "Parallel Radix Join Optimized" implementation denoted as PRO implements the parallel radix join idea by Kim et al. with further optimizations. Mainly it uses the bucket chaining for the build phase instead of the histogram-based relation re-ordering and performs better than other variations such as PRHO, which applies SIMD and prefetching optimizations.

Parameters
relRinput relation R - inner relation
relSinput relation S - inner relation
Returns
number of result tuples

Definition at line 1613 of file parallel_radix_join.c.

◆ radix_cluster()

void radix_cluster ( relation_t *restrict  outRel,
relation_t *restrict  inRel,
int32_t *restrict  hist,
int  R,
int  D 
)

Radix clustering algorithm (originally described by Manegold et al) The algorithm mimics the 2-pass radix clustering algorithm from Kim et al. The difference is that it does not compute prefix-sum, instead the sum (offset in the code) is computed iteratively.

Warning
This method puts padding between clusters, see radix_cluster_nopadding for the one without padding.
Parameters
outRel[out] result of the partitioning
inRel[in] input relation
hist[out] number of tuples in each partition
Rcluster bits
Dradix bits per pass
Returns
tuples per partition.

Definition at line 545 of file parallel_radix_join.c.

◆ radix_cluster_nopadding()

void radix_cluster_nopadding ( relation_t outRel,
relation_t inRel,
int  R,
int  D 
)

Radix clustering algorithm which does not put padding in between clusters. This is used only by single threaded radix join implementation RJ.

Parameters
outRel
inRel
hist
R
D

Definition at line 595 of file parallel_radix_join.c.

◆ RJ()

result_t * RJ ( relation_t relR,
relation_t relS,
int  nthreads 
)

RJ: Radix Join.

The "Radix Join" implementation denoted as RJ implements the single-threaded original multipass radix cluster join idea by Manegold et al.

Parameters
relRinput relation R - inner relation
relSinput relation S - inner relation
Warning
nthreads parameter does not have any effect for this algorithm.
Returns
number of result tuples

Definition at line 1634 of file parallel_radix_join.c.

◆ serial_radix_partition()

void serial_radix_partition ( task_t *const  task,
task_queue_t join_queue,
const int  R,
const int  D 
)

This function implements the radix clustering of a given input relations. The relations to be clustered are defined in task_t and after clustering, each partition pair is added to the join_queue to be joined.

Parameters
taskdescription of the relation to be partitioned
join_queuetask queue to add join tasks after clustering

Definition at line 658 of file parallel_radix_join.c.