Hasso-Plattner-Institut
  
Hasso-Plattner-Institut
Prof. Dr. h.c. Hasso Plattner
  
 

In-Memory Data Management Research


 General Regulations

  • Type of course: Master Seminar, Winter Term 2013/14
  • Offerer: Dr. Matthias Uflacker, Jens Krueger
  • Location: Haus D, E-9/10, Hasso-Plattner-High-Tech-Park, August-Bebel-Str. 88 (tbc)
  • Time: The seminar will mostly be held in individual meetings. Only the following obligatory meetings are held in the seminar's assigned slot. (Tuesday, Thursday, 11h00-12h30 (s.t.) )

    • Oct 15.: Organization and Topics
    • Oct 17: In-Memory Data Management Introduction and Topic Deep Dive
    • Dec 3 + 5 + 10 : Midterm Presentations
    • Jan 23 + 28 + 30: Final Presentations

  • Final Prototype due: 21th of February 28th of February
  • Final Reports due: 28th of February 7th of March
  • 4 Semesterwochenstunden
  • 6 credit points (graded)
  • Area of specialisation: BPET, OSIS, SAMT

Short Description

The goal of the research seminar is to teach the students the basics of scientific research and a basic knowledge of the inner mechanics of in-memory databases. The seminar is focused on implementation concepts for columnar in-memory database systems on modern hardware. Hence, each student will work individually on a topic, resulting in a final paper (10-12 pages, IEEE) in addition to a midterm and final presentation. The topics vary from basic data structure concepts to in-memory optimised algorithms. Each topic will have an implementation component, which should be implemented and closely evaluated in the resulting paper.

Seminar Topics

HYRISE: In-Memory Database Operator Performance Optimization The students will optimize a concrete operator and contribute to the open source project HYRISE. The goal is to apply state of the art algorithms of in-memory computing and improve the performance of the current system. The topic can be chosen by multiple students. HYRISE: Custom Operator Implementation in JavaScript Based on an existing browser-based UI for Hyrise, the student will implement the functionality of custom plan operations implemented in JavaScript. HYRISE: Implementation of Indices The students will implement one or more concrete indices and measure their performance impact as well as their maintenance costs. The goal is to implement indices which benefit from the unique main-delta structure of HYRISE. The topic can be chosen by multiple students. Natural Language Software Development Using special-purpose programming languages like SQL introduces barriers for users of information systems that lack technical background knowledge. Especially in an enterprise context, people in charge of making decisions that impact business strategy and development know exactly, which data is needed, but they need to communicate their needs to a technician to access it. This mismatch introduces potential for misconceptions, delay and deficiencies. Based on a tool that enables a user to specify data layout and queries in natural language by parsing the textual input and generating corresponding database objects, the students will work on including complex operations based on textual and mathematical descriptions. Pricing Current data models for different application domains are based on characteristics and assumptions for row-oriented database technology. This project investigates alternative data model designs, which leverage column-oriented database technology. Based on an existing data model for price calculation, we develop and compare different data models with each other. Next Generation Enterprise Application Languages Enterprise Applications are historically based on a three tier architecture that allowed and increased scalability. Within this architecture, a database server was used as a central instance only responsible for loading and storing data. In addition application servers were used to execute all runtime logic. With the current trend to move data intensive business logic to the database, the need for more specialized programming languages arises on one side, while the requirement for maintainability and easy understanding of application logic increases.
The students working on this topic will evaluate a programming language specialized for enterprise applications that differentiates between design time and runtime. During design time, only one programming language is used, while it gets compiled to different target platforms during runtime. Ruby on Rails: Stored Procedure Management Based on the existing ActiveRecord HANA adapter, the students will implement a stored procedure management system for Ruby on Rails that allows developers the simple calling, creation and management of stored procedures. In a second step the students will develop an automatic transformation of Ruby code to SQLScript and analyse which code should be extracted to stored procedures. Soccer Analytics Sensor networks are used to improve modern training sessions in different kinds of sports. Such data sets can reach up to 60 Mio. records for one hour of practice. In this project, we investigate the possibility to identify different types of patterns, based on spatial data of a soccer game. Event Stream Preprocessing Using Co-Processors Today's companies are more and more leveraging sensor data for predictive analytics, e.g. strategic maintenance planning. Based on the amount of streaming data, pre-processing steps like event filtering, aggregation and clustering are required in order to handle incoming events efficiently. In today's reference architecture, those tasks are handled for example by batch systems like Hadoop, introducing a delay between event creation and clustered event evaluation. As part of the project it will be evaluated, how co-processors could be integrated efficiently into an architecture for enterprise applications. Performance Evaluation of SCM for basic DB Operations Storage Class Memory blurs the distinction between memory (considered fast, expensive and volatile) and storage (considered slow, cheap, non-volatile). Connecting Flash/ NVM to main memory using PCIe or even the memory bus itself allows new storage concepts. As part of this seminar EMC provides their latest PCIe-connected flash memory modules as well as a new high-performance memory map subsystem (mmap) that maps capacity from SCM into addressable memory. The seminar evaluates this technology using micro-benchmarks for typical database operations. It is the target to evaluate this technology as a memory extension for deprecated data. Intelligent Memory Management in HYRISE using SCM HYRISE, a columnar, in-memory DB system allocates and reads all data in main memory when it is accessed the first time. With the emergence of Storage Class Memory, data is supposed to be distributed across different storage systems (NVM, Flash, DRAM) depending on its usage characteristics. Within this seminar, the memory management layer of HYRISE is revisited to evaluate how data can be classified and stored in different storage tiers. An implementation (concept) should evaluate the usability of a new high-performance memory map subsystem (using modified mmap) provided by our project partner EMC as well as other techniques of data tiering. The performance evaluation will be conducted using EMC's latest PCIe-conntected flash memory modules. Leveraging Business Semantics to Optimize Joins for Aggregate Cache Enterprise applications work with (business) objects as an abstraction from reality. Within enterprise applications, patterns can be found how those objects are created, stored and accessed. Database operations like joins can use this explicit knowledge for simplifying the execution of cached aggregate queries.

Grading (Leistungserfassungsprozess)

The following components determine the final mark:

Part Valuation in % Type
Presentations (Mid-term / Final) 30 (10 / 20) Personal grade
Results 30 Personal grade
Article 30 Personal grade
General participation in the seminar 10 Personal grade

All of the components must be passed in order to pass the seminar.

Literature

We will provide relevant literature to the project teams according to §52a UrhG. The provided literature shall be used as an introduction into the topic. It does not cover the complete topic.