XJoin: Portable, parallel hash join across diverse XPU architectures with oneAPI

Marinelli, Eugenio; Appuswamy, Raja

Modern server hardware is increasingly heterogeneous with a diverse mix of XPU architectures deployed across CPU, GPU, and FPGAs. However, till date, database developers have had to rely on either proprietary, architecture-specific solutions (like CUDA), or lowlevel, cross-architecture solutions that complicate development (like
OpenCL). The lack of portable parallelism caused by the absence of a common high-level programming framework is one of the main reasons preventing a wider adoption of XPUs by database systems. In this paper, we take the first steps towards solving this problem using oneAPI–a cross-industry effort for developing an open, standards-based unified programming model that extends standard C++ to provide portable parallelism across diverse processor architectures. In particular, we port a recently-proposed, highly-optimized, GPU-based hash join algorithm from CUDA to Data Parallel C++ (DPC++). We then execute the hash join on multicore CPUs, integrated GPUs (Intel GEN9), and discrete GPUs (Intel DG1 and NVIDIA GeForce) without changing a single line of kernel code to demonstrate that DPC++ enables portable parallelism. We compare the performance of DPC++ kernels with hand-optimized CUDA kernels and model-based theoretical performance bounds to demonstrate the performance–portability trade off in using DPC++.

DOI
Type:
Conférence
Date:
2021-06-21
Department:
Data Science
Eurecom Ref:
6567
Copyright:
© ACM, 2021. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in https://doi.org/10.1145/3465998.3466012

PERMALINK : https://www.eurecom.fr/publication/6567