1.
Bodner, T., Pietz, T., Bollmeier, L.J., Ritter, D.: Doppler: Understanding Serverless Query Execution Proceedings of the SIGMOD Workshop on Big Data in Emergent Distributed Environments (2022)
Analyzing and understanding query processing in distributed, cloud databases is difficult and requires laborious system designs. However, serverless query execution, with its massive amount of small, ad-hoc, short-lived, and ephemeral query processors, is even more challenging. To meet this challenge and materialize the economic benefits of serverless computing, systems have to be serverless themselves. We demonstrate Doppler, a serverless system designed to trace serverless data processing systems with a minimal cost and performance overhead and provide a deep understanding of their query execution. We highlight Doppler’s features and capabilities through a proof-of-concept implementation with the serverless data processing system Skyrise.
2.
Bodner, T.: Elastic Query Processing on Function as a Service Platforms Proceedings of the VLDB PhD Workshop (2020)
Modern analytics workloads are not predictable anymore and require database systems to be able to adapt to their performance demands and cost constraints in an instant. Existing database architectures, however, are incapable of meeting this degree of elastic scalability. For this reason, we propose a novel architecture based on function as a service platforms. This architecture and the concepts surrounding it are being implemented in our research prototype Skyrise.
3.
Goel, A., Pound, J., Auch, N., Bumbulis, P., MacLean, S., Färber, F., Gropengiesser, F., Mathis, C., Bodner, T., Lehner, W.: Towards Scalable Real-time Analytics: An Architecture for Scale-out of OLXP Workloads Proceedings of the VLDB Endowment (2015)
We present an overview of our work on the SAP HANA Scale-out Extension, a novel distributed database architecture designed to support large scale analytics over real-time data. This platform permits high performance OLAP with massive scale-out capabilities, while concurrently allowing OLTP workloads. This dual capability enables analytics over real-time changing data and allows fine grained user-specified service level agreements (SLAs) on data freshness. We advocate the decoupling of core database components such as query processing, concurrency control, and persistence, a design choice made possible by advances in high-throughput low-latency networks and storage devices. We provide full ACID guarantees and build on a logical timestamp mechanism to provide MVCC-based snapshot isolation, while not requiring synchronous updates of replicas. Instead, we use asynchronous update propagation guaranteeing consistency with timestamp validation. We provide a view into the design and development of a large scale data management platform for real-time analytics, driven by the needs of modern enterprise customers.
4.
Alexandrov, A., Schiefer, B., Poelman, J., Ewen, S., Bodner, T., Markl, V.: Myriad: Parallel Data Generation on Shared-nothing Architectures Proceedings of the PACT Workshop on Architectures and Systems for Big Data (2011)
The need for efficient data generation for the purposes of testing and benchmarking newly developed massively-parallel data processing systems has increased with the emergence of Big Data problems. As synthetic data model specifications evolve over time, the data generator programs implementing these models have to be adapted continuously -- a task that often becomes more tedious as the set of model constraints grows. In this paper we present Myriad - a new parallel data generation toolkit. Data generators created with the toolkit can quickly produce very large datasets in a shared-nothing parallel execution environment, while at the same time preserve with cross-partition dependencies, correlations and distributions in the generated data. In addition, we report on our efforts towards a benchmark suite for large-scale parallel analysis systems that uses Myriad for the generation of OLAP-style relational datasets.