Modules
	Optimized Partitioning Using SW-buffers

Functions
int64_t	bucket_chaining_join (const relation_t const R, const relation_t const S, relation_t const tmpR, void output)

uint32_t	get_hist_size (uint32_t relSize) __attribute__((always_inline))

int64_t	histogram_join (const relation_t const R, const relation_t const S, relation_t const tmpR, void output)

void	prefetch (void *addr) __attribute__((always_inline))

int64_t	histogram_optimized_join (const relation_t const R, const relation_t const S, relation_t const tmpR, void output)

void	radix_cluster (relation_t restrict outRel, relation_t restrict inRel, int32_t *restrict hist, int R, int D)

void	radix_cluster_nopadding (relation_t outRel, relation_t inRel, int R, int D)

void	serial_radix_partition (task_t const task, task_queue_t join_queue, const int R, const int D)

void	parallel_radix_partition (part_t *const part)

void *	prj_thread (void *param)

result_t *	join_init_run (relation_t relR, relation_t relS, JoinFunction jf, int nthreads)

result_t *	PRO (relation_t relR, relation_t relS, int nthreads)

result_t *	PRH (relation_t relR, relation_t relS, int nthreads)

result_t *	PRHO (relation_t relR, relation_t relS, int nthreads)

result_t *	RJ (relation_t relR, relation_t relS, int nthreads)

Detailed Description

Function Documentation

◆ bucket_chaining_join()

int64_t bucket_chaining_join	(	const relation_t *const	R,
		const relation_t *const	S,
		relation_t *const	tmpR,
		void *	output
	)

This algorithm builds the hashtable using the bucket chaining idea and used in PRO implementation. Join between given two relations is evaluated using the "bucket chaining" algorithm proposed by Manegold et al. It is used after the partitioning phase, which is common for all algorithms. Moreover, R and S typically fit into L2 or at least R and |R|*sizeof(int) fits into L2 cache.

Parameters

R	input relation R
S	input relation S
output	join results, if JOIN_RESULT_MATERIALIZE defined.

Returns: number of result tuples

Definition at line 243 of file parallel_radix_join.c.

◆ get_hist_size()

uint32_t get_hist_size ( uint32_t relSize )

inline

computes and returns the histogram size for join

Definition at line 316 of file parallel_radix_join.c.

◆ histogram_join()

int64_t histogram_join	(	const relation_t *const	R,
		const relation_t *const	S,
		relation_t *const	tmpR,
		void *	output
	)

Histogram-based hash table build method together with relation re-ordering as described by Kim et al. It joins partitions Ri, Si of relations R & S. This is version is not optimized with SIMD and prefetching. The parallel radix join implementation using this function is PRH.

Definition at line 331 of file parallel_radix_join.c.

◆ histogram_optimized_join()

int64_t histogram_optimized_join	(	const relation_t *const	R,
		const relation_t *const	S,
		relation_t *const	tmpR,
		void *	output
	)

Histogram-based hash table build method together with relation re-ordering as described by Kim et al. It joins partitions Ri, Si of relations R & S. This is version includes SIMD and prefetching optimizations as described by Kim et al. The parallel radix join implementation using this function is PRHO. Note: works only for 32-bit keys.

Definition at line 419 of file parallel_radix_join.c.

◆ join_init_run()

result_t * join_init_run	(	relation_t *	relR,
		relation_t *	relS,
		JoinFunction	jf,
		int	nthreads
	)

The template function for different joins: Basically each parallel radix join has a initialization step, partitioning step and build-probe steps. All our parallel radix implementations have exactly the same initialization and partitioning steps. Difference is only in the build-probe step. Here are all the parallel radix join implemetations and their Join (build-probe) functions:

PRO, Parallel Radix Join Optimized --> bucket_chaining_join()
PRH, Parallel Radix Join Histogram-based --> histogram_join()
PRHO, Parallel Radix Histogram-based Optimized -> histogram_optimized_join()

Not an elegant way of passing whether we will numa-localize, but this feature is experimental anyway.

Definition at line 1424 of file parallel_radix_join.c.

◆ parallel_radix_partition()

void parallel_radix_partition ( part_t *const part )

This function implements the parallel radix partitioning of a given input relation. Parallel partitioning is done by histogram-based relation re-ordering as described by Kim et al. Parallel partitioning method is commonly used by all parallel radix join algorithms.

Parameters

part	description of the relation to be partitioned

Definition at line 715 of file parallel_radix_join.c.

◆ prefetch()

void prefetch ( void * addr )

inline

software prefetching function

Definition at line 403 of file parallel_radix_join.c.

◆ PRH()

result_t * PRH	(	relation_t *	relR,
		relation_t *	relS,
		int	nthreads
	)

PRH: Parallel Radix Join Histogram-based.

The "Parallel Radix Join Histogram-based" implementation denoted as PRH implements the parallel radix join idea by Kim et al. without SIMD and prefetching optimizations. The difference from PRO is that the build phase is based on the histogram-based relation re-ordering idea.

Parameters

relR	input relation R - inner relation
relS	input relation S - inner relation

Returns: number of result tuples

Definition at line 1620 of file parallel_radix_join.c.

◆ PRHO()

result_t * PRHO	(	relation_t *	relR,
		relation_t *	relS,
		int	nthreads
	)

PRHO: Parallel Radix Join Histogram-based Optimized.

The "Parallel Radix Join Histogram-based Optimized" implementation denoted as PRHO implements the parallel radix join idea by Kim et al. with SIMD and prefetching optimizations. The difference from PRH is that the probe phase uses SIMD and software prefetching optimizations.

Parameters

relR	input relation R - inner relation
relS	input relation S - inner relation

Returns: number of result tuples

Definition at line 1627 of file parallel_radix_join.c.

◆ prj_thread()

void * prj_thread ( void * param )

The main thread of parallel radix join. It does partitioning in parallel with other threads and during the join phase, picks up join tasks from the task queue and calls appropriate JoinFunction to compute the join task.

Parameters

param

Returns

Definition at line 964 of file parallel_radix_join.c.

◆ PRO()

result_t * PRO	(	relation_t *	relR,
		relation_t *	relS,
		int	nthreads
	)

PRO: Parallel Radix Join Optimized.

The "Parallel Radix Join Optimized" implementation denoted as PRO implements the parallel radix join idea by Kim et al. with further optimizations. Mainly it uses the bucket chaining for the build phase instead of the histogram-based relation re-ordering and performs better than other variations such as PRHO, which applies SIMD and prefetching optimizations.

Parameters

relR	input relation R - inner relation
relS	input relation S - inner relation

Returns: number of result tuples

Definition at line 1613 of file parallel_radix_join.c.

◆ radix_cluster()

void radix_cluster	(	relation_t *restrict	outRel,
		relation_t *restrict	inRel,
		int32_t *restrict	hist,
		int	R,
		int	D
	)

Radix clustering algorithm (originally described by Manegold et al) The algorithm mimics the 2-pass radix clustering algorithm from Kim et al. The difference is that it does not compute prefix-sum, instead the sum (offset in the code) is computed iteratively.

Warning: This method puts padding between clusters, see radix_cluster_nopadding for the one without padding.

Parameters

outRel	[out] result of the partitioning
inRel	[in] input relation
hist	[out] number of tuples in each partition
R	cluster bits
D	radix bits per pass

Returns: tuples per partition.

Definition at line 545 of file parallel_radix_join.c.

◆ radix_cluster_nopadding()

void radix_cluster_nopadding	(	relation_t *	outRel,
		relation_t *	inRel,
		int	R,
		int	D
	)

Radix clustering algorithm which does not put padding in between clusters. This is used only by single threaded radix join implementation RJ.

Parameters

outRel
inRel
hist
R
D

Definition at line 595 of file parallel_radix_join.c.

◆ RJ()

result_t * RJ	(	relation_t *	relR,
		relation_t *	relS,
		int	nthreads
	)

RJ: Radix Join.

The "Radix Join" implementation denoted as RJ implements the single-threaded original multipass radix cluster join idea by Manegold et al.

Parameters

relR	input relation R - inner relation
relS	input relation S - inner relation

Warning: nthreads parameter does not have any effect for this algorithm.

Returns: number of result tuples

Definition at line 1634 of file parallel_radix_join.c.

◆ serial_radix_partition()

void serial_radix_partition	(	task_t *const	task,
		task_queue_t *	join_queue,
		const int	R,
		const int	D
	)

This function implements the radix clustering of a given input relations. The relations to be clustered are defined in task_t and after clustering, each partition pair is added to the join_queue to be joined.

Parameters

task	description of the relation to be partitioned
join_queue	task queue to add join tasks after clustering

Definition at line 658 of file parallel_radix_join.c.

Modules

Functions

Detailed Description

Function Documentation

◆ bucket_chaining_join()

◆ get_hist_size()

◆ histogram_join()

◆ histogram_optimized_join()

◆ join_init_run()

◆ parallel_radix_partition()

◆ prefetch()

◆ PRH()

◆ PRHO()

◆ prj_thread()

◆ PRO()

◆ radix_cluster()

◆ radix_cluster_nopadding()

◆ RJ()

◆ serial_radix_partition()