In-Memory Data Management Research
- Type of course: Master Seminar, Winter Term 2013/14
- Offerer: Dr. Matthias Uflacker, Jens Krueger
- Location: Haus D, E-9/10, Hasso-Plattner-High-Tech-Park, August-Bebel-Str. 88 (tbc)
- Time: The seminar will mostly be held in individual meetings. Only the following obligatory meetings are held in the seminar's assigned slot. (Tuesday, Thursday, 11h00-12h30 (s.t.) )
- Oct 15.: Organization and Topics
- Oct 17: In-Memory Data Management Introduction and Topic Deep Dive
- Dec 3 + 5 + 10 : Midterm Presentations
- Jan 23 + 28 + 30: Final Presentations
- Final Prototype due:
21th of February 28th of February
- Final Reports due:
28th of February 7th of March
- 4 Semesterwochenstunden
- 6 credit points (graded)
- Area of specialisation: BPET, OSIS, SAMT
The goal of the research seminar is to teach the students the basics of scientific research and a basic knowledge of the inner mechanics of in-memory databases. The seminar is focused on implementation concepts for columnar in-memory database systems on modern hardware. Hence, each student will work individually on a topic, resulting in a final paper (10-12 pages, IEEE) in addition to a midterm and final presentation. The topics vary from basic data structure concepts to in-memory optimised algorithms. Each topic will have an implementation component, which should be implemented and closely evaluated in the resulting paper.
The students working on this topic will evaluate a programming language specialized for enterprise applications that differentiates between design time and runtime. During design time, only one programming language is used, while it gets compiled to different target platforms during runtime. Ruby on Rails: Stored Procedure Management Based on the existing ActiveRecord HANA adapter, the students will implement a stored procedure management system for Ruby on Rails that allows developers the simple calling, creation and management of stored procedures. In a second step the students will develop an automatic transformation of Ruby code to SQLScript and analyse which code should be extracted to stored procedures. Soccer Analytics Sensor networks are used to improve modern training sessions in different kinds of sports. Such data sets can reach up to 60 Mio. records for one hour of practice. In this project, we investigate the possibility to identify different types of patterns, based on spatial data of a soccer game. Event Stream Preprocessing Using Co-Processors Today's companies are more and more leveraging sensor data for predictive analytics, e.g. strategic maintenance planning. Based on the amount of streaming data, pre-processing steps like event filtering, aggregation and clustering are required in order to handle incoming events efficiently. In today's reference architecture, those tasks are handled for example by batch systems like Hadoop, introducing a delay between event creation and clustered event evaluation. As part of the project it will be evaluated, how co-processors could be integrated efficiently into an architecture for enterprise applications. Performance Evaluation of SCM for basic DB Operations Storage Class Memory blurs the distinction between memory (considered fast, expensive and volatile) and storage (considered slow, cheap, non-volatile). Connecting Flash/ NVM to main memory using PCIe or even the memory bus itself allows new storage concepts. As part of this seminar EMC provides their latest PCIe-connected flash memory modules as well as a new high-performance memory map subsystem (mmap) that maps capacity from SCM into addressable memory. The seminar evaluates this technology using micro-benchmarks for typical database operations. It is the target to evaluate this technology as a memory extension for deprecated data. Intelligent Memory Management in HYRISE using SCM HYRISE, a columnar, in-memory DB system allocates and reads all data in main memory when it is accessed the first time. With the emergence of Storage Class Memory, data is supposed to be distributed across different storage systems (NVM, Flash, DRAM) depending on its usage characteristics. Within this seminar, the memory management layer of HYRISE is revisited to evaluate how data can be classified and stored in different storage tiers. An implementation (concept) should evaluate the usability of a new high-performance memory map subsystem (using modified mmap) provided by our project partner EMC as well as other techniques of data tiering. The performance evaluation will be conducted using EMC's latest PCIe-conntected flash memory modules. Leveraging Business Semantics to Optimize Joins for Aggregate Cache Enterprise applications work with (business) objects as an abstraction from reality. Within enterprise applications, patterns can be found how those objects are created, stored and accessed. Database operations like joins can use this explicit knowledge for simplifying the execution of cached aggregate queries.
The following components determine the final mark:
| Part || Valuation in % || Type |
| Presentations (Mid-term / Final) || 30 (10 / 20) || Personal grade |
| Results || 30 || Personal grade |
| Article || 30 || Personal grade |
| General participation in the seminar || 10 || Personal grade |
All of the components must be passed in order to pass the seminar.
We will provide relevant literature to the project teams according to §52a UrhG. The provided literature shall be used as an introduction into the topic. It does not cover the complete topic.