1.
Maltenberger, T., Tolovski, I., Rabl, T.: Efficiently Joining Large Relations on Multi-GPU Systems. Proceedings of the VLDB Endowment. 18, 4653–4667 (2025).
Growing data volumes present a mounting challenge to relational joins. GPUs have gained widespread adoption as database accelera- tors for operators such as joins due to their high instruction through- put and memory bandwidth. Most published GPU-accelerated joins are single-GPU algorithms that do not leverage modern multi-GPU platforms effectively. The few proposed multi-GPU algorithms ei- ther fail to exploit the high-speed P2P interconnects between the GPUs or to handle large out-of-core data natively. In this paper, we present a heterogeneous multi-GPU sort-merge join that overcomes both limitations. It is composed of a merge- or radix partitioning- based P2P-enabled multi-GPU sort phase, a parallel CPU-based mul- tiway merge phase, and a hybrid join phase that combines a CPU merge path partition with a binary search-based multi-GPU join strategy. We evaluate our novel multi-GPU join on two platforms with fast NVLink- and NVSwitch-based P2P interconnects. We show that our join outperforms state-of-the-art CPU and GPU baselines regardless of the workload. It outperforms parallel CPU sort-merge and radix-hash joins up to 15.2× and 5.5×, respectively. Compared to non-P2P-enabled multi-GPU joins, it achieves speedups of 8.7× (sort-merge) and 2.5× (hybrid-radix). We measure that our join’s hybrid join phase with overlapped copy and compute operations contributes as little as 22% to its end-to-end runtime. If the input relations are pre-sorted, it is up to 14.4× faster than the hybrid-radix join. Our join scales well with the number of GPUs and benefits from data skew with as much as 12% shorter join durations.
2.
Bodner, T., Radig, T., Justen, D., Ritter, D., Rabl, T.: An Empirical Evaluation of Serverless Cloud Infrastructure for Large-Scale Data Processing. Extending Database Technology (EDBT). pp. 935–948 (2025).
3.
Boissier, M., Weisgut, M., Rabl, T.: Compression in Main Memory Database Systems: Cost and Performance Trade-Offs of Workload-Driven Data Encoding. Datenbanksysteme für Business, Technologie und Web (BTW) (2025).
4.
Wang, Y., Boissier, M., Luthra, M., Rabl, T.: Dema: Efficient Decentralized Aggregation for Non-Decomposable Quantile Functions. International Conference on Extending Database Technology (EDBT) (2025).
The growing number of Internet of Things (IoT) devices has led to the widespread adoption of decentralized networks to handle unbounded data streams in a variety of applications. Traditional stream processing engines rely on centralized window aggregation, resulting in high network overhead and processing bottlenecks. Current decentralized solutions mitigate these issues by offloading partial aggregations to edge devices, but they only support decomposable functions like sum and count. Non-decomposable functions, such as median and quantile, remain a challenge as partial results cannot be merged without accessing the complete dataset. To address this, we propose Dema, a decentralized window aggregation technique for non-decomposable functions. Dema reduces network traffic and computational load by performing localized sorting and transmitting statistical summaries rather than raw data. Our approach efficiently calculates median and quantile values, achieving up to a 99% reduction in network traffic compared to state-of-the-art methods. Our evaluation results show that Dema significantly outperforms existing approaches in terms of throughput and scalability, while ensuring accurate results.
5.
Bodner, T., Boissier, M., Rabl, T., Salazar-Díaz, R., Schmeller, F., Strassenburg, N., Tolovski, I., Weisgut, M., Yue, W.: A Case for Ecological Efficiency in Database Server Lifecycles. Conference on Innovative Data Systems Research (CIDR). www.cidrdb.org (2025).
Like other software systems, database systems benefit from hardware performance improvements. For the longest time, acquiring new hardware resulted in significant software efficiency gains due to exponential improvements of hardware capabilities. Physical limits in hardware manufacturing have brought former niche designs into standard components, such as multiple cores and specialized circuits. Even with these new designs, hardware improvements are decreasing, while software and applications are still becoming increasingly complex and resource demanding. Given the resource consumption of hardware manufacturing, the ideal lifecycle of hardware naturally has to extend from an efficiency aspect. In this paper, we try to estimate efficiency of lifecycle duration of database hardware. We calculate the reduction in performance improvements of hardware using publicly available performance numbers, as well as our own benchmarks, and relate them to the specified thermal design power to get the power efficiency. Incorporating estimations on hardware and power production carbon intensity, we challenge current wisdom on hardware replacement frequencies and try to establish new rules of thumb on the ideal hardware lifecycles for database deployments. We present opportunities for future research trends.