1.
Tolovski, I., Rabl, T.: Addressing Data Management Challenges for Interoperable Data Science. 1st International Workshop on Data-driven AI (DATAI) @ VLDB ’24 (2024).
2.
Benson, L., Binnig, C., Bodensohn, J.-M., Lorenzi, F., Luo, J., Porobic, D., Rabl, T., Sanghi, A., Sears, R., Tözün, P., Ziegler, T.: Surprise Benchmarking: The Why, What, and How. (2024).
3.
Riekenbrauck, N., Weisgut, M., Lindner, D., Rabl, T.: A Three-Tier Buffer Manager Integrating CXL Device Memory for Database Systems. Joint International Workshop on Big Data Management on Emerging Hardware and Data Management on Virtualized Active Systems @ ICDE 2024 (2024).
4.
Salazar-Díaz, R., Glavic, B., Rabl, T.: InferDB: In-Database Machine Learning Inference Using Indexes. Proceedings of the VLDB Endowment. 17, 1830–1842 (2024).
The performance of inference with machine learning (ML) models and its integration with analytical query processing have become critical bottlenecks for data analysis in many organizations. An ML inference pipeline typically consists of a preprocessing workflow followed by prediction with an ML model. Current approaches for in-database inference implement preprocessing operators and ML algorithms in the database either natively, by transpiling code to SQL, or by executing user-defined functions in guest languages such as Python. In this work, we present a radically different approach that approximates an end-to-end inference pipeline (preprocessing plus prediction) using a light-weight embedding that discretizes a carefully selected subset of the input features and an index that maps data points in the embedding space to aggregated predictions of an ML model. We replace a complex preprocessing workflow and model-based inference with a simple feature transformation and an index lookup. Our framework improves inference latency by several orders of magnitude while maintaining similar prediction accuracy compared to the pipeline it approximates.
5.
Wang, Y., Moczalla, R., Luthra, M., Rabl, T.: Deco: Fast and Accurate Decentralized Aggregation of Count-Based Windows in Large-Scale IoT Applications. 27th International Conference on Extending Database Technology (EDBT ’24). (2024).
In the realm of large-scale Internet-of-Things applications, aggregating data using count-based windows is a formidable challenge. Current methods, either centralized and slow or decentralized with potential inaccuracies, fail to strike a balance. This paper introduces Deco, a novel approach tailored for swift and precise aggregation in distributed stream processing systems. Accomplishing this balance is complex due to the dynamic nature of event distribution: events arrive at varying rates, unordered, and at diverse times, making accurate window computation a challenge. To overcome this, we propose a lightweight prediction method that derives local window sizes based on the previously observed event rates and performs corrections when necessary to ensure accurate and fast query results. These windows are processed in a decentralized manner on local nodes, verified for correctness, and then aggregated on a root node. Our evaluation showcases Deco’s superiority over centralized methods, outperforming others significantly. Deco reduces network traffic by up to 99% and exhibits linear scalability with node count.