Actor Database Systems (Sommersemester 2018)

Lecturer: Dr. Thorsten Papenbrock (Information Systems)

General Information

Weekly Hours: 4
Credits: 6
Graded: yes
Enrolment Deadline: 20.04.2018
Teaching Form: Seminar
Enrolment Type: Compulsory Elective Module
Maximum number of participants: 6

Programs, Module Groups & Modules

IT-Systems Engineering MA

IT-Systems Engineering
- HPI-ITSE-A Analyse
IT-Systems Engineering
- HPI-ITSE-E Entwurf
IT-Systems Engineering
- HPI-ITSE-K Konstruktion
IT-Systems Engineering
- HPI-ITSE-M Maintenance
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-K Konzepte und Methoden
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-S Spezialisierung
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-T Techniken und Werkzeuge

Description

(official website here)

Data-intensive applications often utilize scalable data tier solutions to handle their load. Sometimes, these data tier solutions are simple key-value store services that partition and distribute the data across several nodes; although this approach offers scalable reads and writes, it still leaves the data processing complexity, e.g., joins, transactions, and constraint checking, to the middle layer of the application. Other data tier solutions are full-featured database management systems with built-in support for declarative querying, transactions, stored procedures, integrity constraints, and so forth; they usually offer data sharding and replication features for scale-out, but their monolithic architecture makes it hard to implement and maintain such features.

The Actor Model is a design pattern for distributed applications, i.e., applications that are supposed to run distributed on several nodes in a computer network. It has gained a lot of popularity in the past years, because workload parallelization and distribution became the driving forces to make modern applications faster. Code written in the actor model can, however, not only be scaled-out to different machines more easily, it is also easier to maintain and enforces a more natural system architecture. For this reason, recent research projects revisited the architecture of modern database management systems and proposed to re-implement them in the Actor Model.

In this seminar, we build our own database on top of the Actor Model. The challenge for this project is that all data – whether on disk or in main memory – must be stored as private state of some actor and each actor can only access its own state. Assume, for instance, that there is one actor for each table in the database. All requests to that table are handled by this actor (or its child actors). We then encounter the following (and further) challenges:

Multi-table operations: Queries that use operations, such as joins and intersections, require actors to combine their data views. The actors basically need to find the solution for such query operations via messaging and without exchanging their entire data.
Partitioning: The actor model demands that the data is split into partitions that are each owned by a different actor. Once partitioned, the data can easily be distributed with their respective actors, but the partitioning strategy can be arbitrarily complex, e.g., from “each table is owned by one actor” to “each block of 64 MB is owned by one actor”.
Replication: For fault tolerance and further performance improvements, one could consider replication of partitions. In this case, the actor database system requires additional replication strategies (may they be leader-less or leader-based) as well as consensus mechanisms.
Constraints: Integrity constraints, such as foreign-keys, span across different tables, which are – in our case – hold by different actors. Ensuring that these constraints are met with every write operation to each of the tables is a challenge and might require additional actor communication.
Transactions: A transaction is a sequence of data query and/or manipulation operations that should be consistently answered and/or applied. Since this usually involves different, independent actors, actor databases need their own strategies to successfully process transactions.
Asynchronous actor behavior: Actors communicate asynchronously but the outside world might work synchronously. Hence, at some stage, the actor database system needs to adapt to this outside behavior.

The goal of this seminar is to develop an actor database system prototype that offers a simple SQL-like API for basic CRUD functions on relational tables and their tuples. We will explore the different challenges for this new architecture and propose prototypical solutions. In the end, we evaluate our actor database system against some monolithic database management system.

Requirements

For this seminar, participants require the following prerequisites (or must catch up on these at the beginning of the seminar):

Database knowledge (ideally Database System I and Database Systems II)
Akka knowledge (ideally Distributed Data Analytics)
Java or Scala knowledge

Zurück