Our team is giving a series of lectures and seminars with a focus on enterprise systems design and in-memory data management. Strong links to the industry ensure a close connection between theory and its implementation in the real world.

If you are having questions regarding one of our publications, please contact the authors.

Master Project 2014/15

Motivation

In-memory databases are on the rise. All major database vendors are working on in-memory database solutions. Due to decreasing main memory prices and new hardware developments, even very large systems can be entirely stored in main memory. This allows not only transactional queries to be processed in real-time, but also analytical ones – forming a so-called mixed workload, running on a single database system. However, main memory is still a comparatively scarce resource whose utilization needs to be optimized. An immanent problem in today’s in-memory databases is that data that is rarely or never accessed, is handled the same way as often-requested data. As of today, mixed-workload databases have no approach for data eviction or caching. However, storing irrelevant (cold) data with the same priority, as highly relevant (hot) data does not fully leverage the potential of modern server systems. If irrelevant data could be evicted to other storages based on its relevance, memory utilization and query performance could be improved. To quantify data relevance in a real-world database system, you will have access to the workload of a large enterprise system. This productive system contains more than 100.000 tables, has a compressed-size of ˜2.5 TB, and handles ˜1.5 billion queries each day. The goal of this project is to use workload characteristics in order to classify data into hot and cold partitions using so-called dynamic aging rules. By implementing a mechanism for identifying appropriate rules for a given workload, we can avoid access to cold partitions for the majority of queries. As a result, queries are answered on a fraction of the system data improving query performance, system resource consumption as well as overall memory utilization.

Project Goals

Familiarization with in-memory data management and hands-on experience with SAP HANA
Analyze data-selection characteristics of productive database workload traces
Implement a concept for dynamic aging rules
Evaluation and application of aging rules on a productive workload

Technology & Skills

Technologies and languages for the project are intentionally not restricted and will be determined during the requirements engineering. This openness with regard to technologies requires a broad set of skills. Hence, participants should have interest in any of the following:

Python (Django, Panda, SciKit, …)
Unix Bash/Shell Scripting
Database Technologies (In-Memory Data Management, SQL, Stored-Procedures, ...)
Big Data Analysis (R, SAP Predictive Analytics Library, SAP Lumira, …)

Group Structure and Project Start

The team will consist of 3-6 students. Project start will be October 13, 2014. On October 8, 2014 at 4 PM, there will be an initial meeting with all participants to get familiar with your supervisors, colleagues, and the general project setup.

News

22.09.2023 | Trends and Concepts in the Softwareindustry Seminar offered in WiSe 2023/2024

Trends and Concepts in the Softwareindustry Seminar offered in WiSe 2023/2024 > Go to article

22.05.2023 | Christopher Hagedorn Successfully Defended His PhD Thesis

Christopher Hagedorn Successfully Defended His PhD Thesis > Go to article

03.03.2023 | Last Trends and Concepts course of Prof. Hasso Plattner

After more than 20 years of teaching, our founder and benefactor Prof. Hasso Plattner visited the HPI this week for his … > Go to article

01.03.2023 | Jan Kossmann Successfully Defended His PhD Thesis

Last week, Jan Kossmann another PhD student of our EPIC group successfully defended his thesis on the topic of … > Go to article

26.02.2023 | Paper on Data Tiering in Hyrise Published in BTW Proceedings

Our latest paper on data tiering in Hyrise "Workload-Driven Data Placement for Tierless In-Memory Database Systems" by … > Go to article

24.02.2023 | Paper on EPIC Research Group Published in SIGMOD Record

Our report “Enterprise Platform and Integration Concepts Research at HPI” has been published in the December issue of … > Go to article

30.11.2022 | Paper on Database Optimizations for Spatio-Temporal Data published in PVLDB

Our paper “Robust and Budget-Constrained Encoding Configurations for In-Memory Database Systems” has been published in … > Go to article

04.10.2022 | Günter Hesse Successfully Defended His PhD Thesis

Last week, Günter Hesse another PhD student of our EPIC group successfully defended his thesis on the topic of "A … > Go to article

08.07.2022 | Successful PhD Defense by Markus Dreseler

Markus Dreseler has successfully defended his PhD thesis on Automatic Tiering for In-Memory Database Systems. > Go to article

Literature

"A Course in In-Memory Data Management" by Prof. Dr. h.c. Hasso Plattner. This book is the culmination of six years work of in-memory research. As such, it provides the technical foundation for combined transactional and analytical workloads inside one single database as well as examples of new applications that are now possible given the availability of the new technology. The book is available at Springer.

Contact

Dr. Michael Perscheid

Chair Representative

Tel.: +49 (331) 5509-566

E-Mail: michael.perscheid(at)hpi.de

Office:

Room: V-2.12

Tel.: +49 (331) 5509-560

Fax: +49 (331) 5509-579

E-Mail: office-epic(at)hpi.de

Contact Details