Analytics workloads are often both interactive, serving user-facing applications, and bursty, leaving pre-provisioned resources idle much of the time. They require database systems to be elastically scalable to achieve sufficient performance and cost efficiency. Serverless cloud infrastructure, such as Function-as-a-Service platforms or object storage services, promises ultimate elasticity with its fine-grained resource allocation and billing.
The EPIC research group is building the Skyrise cloud-based database system on serverless infrastructure components. For the beginning of the summer term, we expect to have an early version of its query engine with basic execution operators in place. The query operators are implemented as cloud functions to be run in function services, such as AWS Lambda.
In this project, we aim to build out the Skyrise query engine to better cover the capabilities needed to run the widely used TPC-H benchmark for comparing analytical database systems. The project goals include:
Extend our benchmark framework with operator microbenchmarks.
Improve the memory management and concurrency in operator implementations.
Analyze the TPC-H benchmark and identify missing operator capabilities.
Extend the exchange operator to more topologies and partitioning schemes.
Grouping, sorting, joins..
Above goals may be addressed largely independently. We will select goals depending on the number of student participants, their interests, and our progress during the project.
To facilitate the development of Skyrise, we have a tool chain for both local code execution on your notebook and remote execution in the AWS public cloud. We further offer you continuous support by the Skyrise development team.
After this project, there will be research opportunities to dive deeper into identified issues in the form of Master’s theses.