1.
Schmeller, F., Nugroho, D.P.A., Zeuch, S., Rabl, T.: Towards A GPU-Accelerated Stream Processing Engine Through Query Compilation. Lernen, Wissen, Daten, Analysen. (LWDA ’24) (2024).
Over the last decade, data stream processing has emerged to provide real-time insights into large, unbounded volumes of data. At the same time, graphics processing units (GPU) have become an important accelerator for improving the performance of compute-bound applications. Nevertheless, state-of-the-art data streaming systems opt to scale-out and typically do not make efficient use of the underlying hardware. Recent work has shown that query compilation is a viable technique to support hardware advancements in query processing engines. However, it often comes with high development and maintenance costs. In particular, when the process involves hardware accelerators such as GPUs. In this paper, we propose a framework for compiling data stream queries to efficient GPU code in a developer-friendly manner. We demonstrate the feasibility of our framework by integrating it into the data management system NebulaStream. Our experiments show that frequent memory transfers between CPU and GPU impact the query processing throughput.
2.
Wang, Y.: Efficient Stream Processing in Decentralized Networks. PhD@ VLDB. 2024 (2024).
Internet-of-things (IoT) devices are widely used in industry as well as in research and are deployed in many applications. These massive amounts of devices are connected in large decentralized networks and produce unbounded data streams with continuous data. To process these data streams timely, current stream processing engines (SPEs) collect all data in a centralized data center. This approach leads to high network utilization and can create a bottleneck in the data center, as all data is transmitted via the network and results are computed centrally. State-of-the-art solutions push down partial window aggregations to machines that are near data streams. However, these solutions are limited to a single simple query. In this paper, we present our work on three solutions for different decentralized aggregations: Desis, Deco, and Dema, which significantly improve the performance of stream processing in decentralized networks. Our solutions reduce network traffic by up to 99.9%.
3.
Wang, Y., Boissier, M., Rabl, T.: A Survey of Stream Processing System Benchmark. 16th TPC Technology Conference on Performance Evaluation & Benchmarking (TPCTC) @ VLDB ’24 (2024).
Stream processing systems are a fundamental component of modern data processing, enabling timely and efficient handling of streaming data. To assess and compare the capabilities of stream processing systems, various benchmarks have been proposed over the past years. Examples span a wide range of use cases, ranging from benchmarks for enterprise computing to social network analyses and IoT networks. These benchmarks are designed with different focuses and exhibit different characteristics during execution. In this paper, we review existing stream processing benchmarks and analyze them across five dimensions: benchmark type, included workloads, data ingestion, supported systems under test (SUT), and tracked metrics. We compare their similarities and differences, providing a comprehensive overview of existing benchmarks. Finally, we discuss aspects that have been overlooked and highlight those that should be addressed when benchmarking future generations of streaming systems.
4.
Tolovski, I., Rabl, T.: Addressing Data Management Challenges for Interoperable Data Science. 1st International Workshop on Data-driven AI (DATAI) @ VLDB ’24 (2024).
The development of data science pipelines (DSPs) has been steadily growing in popularity. While the increasing number of applications can also be attributed to novel algorithms and analytics libraries, the interoperability of new DSPs has been limited. To investigate this, we curated a corpus of over 494k GitHub Python repositories. We find that only 20% of the data science pipelines provide access to their input data and only 14% use a data backend. These findings highlight the key pain points in the development of interoperable DSPs. We identify five open data management challenges related to pipeline analysis, data access, and storage. We introduce Stork, a system for automated pipeline analysis, transformation, and data migration. Stork provides open data access while removing the hu- man in the loop when reproducing results and migrating projects to different storage and execution environments. We analyze terabytes of DSPs with Stork and successfully process 72% of the pipelines, transforming 75% of the accessible datasets.
5.
Hendrik, M., Del Monte, B., Rabl, T.: Ghostwriter: a Distributed Message Broker on RDMA and NVM. 15th International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures (2024).
Modern stream processing setups heavily rely on message bro- kers such as Apache Kafka or Apache Pulsar. These systems act as buffers and re-readable sources for downstream systems or applica- tions. They are typically deployed on separate servers, requiring extra resources, and achieve persistence through disk-based storage, limiting achievable throughput. In this paper, we present Ghost- writer, a message broker that utilizes remote direct memory access (RDMA) and non-volatile memory (NVM) for highly efficient mes- sage transfer and storage. Utilizing the hardware characteristics of RDMA and NVM, we achieve data throughput that is only limited by the underlying hardware, while reducing computation and dis- aggregating storage and data transfer coordination. Ghostwriter achieves performance improvements of up to an order of magnitude in throughput and latency over state-of-the-art solutions.
6.
Benson, L., Binnig, C., Bodensohn, J.-M., Lorenzi, F., Luo, J., Porobic, D., Rabl, T., Sanghi, A., Sears, R., Tözün, P., Ziegler, T.: Surprise Benchmarking: The Why, What, and How. Proceedings of the Tenth International Workshop on Testing Database Systems (DBTest). pp. 1–8 (2024).
7.
Riekenbrauck, N., Weisgut, M., Lindner, D., Rabl, T.: A Three-Tier Buffer Manager Integrating CXL Device Memory for Database Systems. Joint International Workshop on Big Data Management on Emerging Hardware and Data Management on Virtualized Active Systems @ ICDE 2024 (2024).
8.
Salazar-Díaz, R., Glavic, B., Rabl, T.: InferDB: In-Database Machine Learning Inference Using Indexes. Proceedings of the VLDB Endowment. 17, 1830–1842 (2024).
The performance of inference with machine learning (ML) models and its integration with analytical query processing have become critical bottlenecks for data analysis in many organizations. An ML inference pipeline typically consists of a preprocessing workflow followed by prediction with an ML model. Current approaches for in-database inference implement preprocessing operators and ML algorithms in the database either natively, by transpiling code to SQL, or by executing user-defined functions in guest languages such as Python. In this work, we present a radically different approach that approximates an end-to-end inference pipeline (preprocessing plus prediction) using a light-weight embedding that discretizes a carefully selected subset of the input features and an index that maps data points in the embedding space to aggregated predictions of an ML model. We replace a complex preprocessing workflow and model-based inference with a simple feature transformation and an index lookup. Our framework improves inference latency by several orders of magnitude while maintaining similar prediction accuracy compared to the pipeline it approximates.
9.
Wang, Y., Moczalla, R., Luthra, M., Rabl, T.: Deco: Fast and Accurate Decentralized Aggregation of Count-Based Windows in Large-Scale IoT Applications. 27th International Conference on Extending Database Technology (EDBT ’24). (2024).
In the realm of large-scale Internet-of-Things applications, aggregating data using count-based windows is a formidable challenge. Current methods, either centralized and slow or decentralized with potential inaccuracies, fail to strike a balance. This paper introduces Deco, a novel approach tailored for swift and precise aggregation in distributed stream processing systems. Accomplishing this balance is complex due to the dynamic nature of event distribution: events arrive at varying rates, unordered, and at diverse times, making accurate window computation a challenge. To overcome this, we propose a lightweight prediction method that derives local window sizes based on the previously observed event rates and performs corrections when necessary to ensure accurate and fast query results. These windows are processed in a decentralized manner on local nodes, verified for correctness, and then aggregated on a root node. Our evaluation showcases Deco’s superiority over centralized methods, outperforming others significantly. Deco reduces network traffic by up to 99% and exhibits linear scalability with node count.