Dr. Jan Kossmann

	Phone:	+49 (331) 5509-1323
	Email:	jan.kossmann(at)hpi.de
	Address:	August-Bebel-Str. 88, 14482 Potsdam
	Room:	V-2.02 (Campus II)
	Links:	dblp, Google Scholar, LinkedIn

Research Area: Autonomous Data Management

Research

Unsupervised Database Optimization: Efficient Index Selection & Data Dependency-driven Query Optimization

My research focuses on Unsupervised Database Optimization.
I investigate how automated index selection and data dependencies can improve workload processing
so that database systems run more efficiently without costly manual optimizations.

Research Abstract

The performance of a database system depends on its configuration. Modern database systems offer many inter-dependent configuration options to allow the processing of variable workloads from different domains and running on heterogeneous hardware. The amount of possible configurations increases exponentially with the available options. Thus, the - already expensive - configuration process surpasses the capabilities of human database administrators. To tackle this issue, self-managing database systems utilize workload-driven optimization and machine learning techniques to configure database systems.

We focus our work on three specific self-managing database challenges: (i) system integration, (ii) index selection, and (iii) cost estimation. (i) System integration: DBMSs were not designed with self-managing capabilities in mind. We propose a generalized framework that provides facilities to enable self-managing DBMS by providing components for workload monitoring, forecasting, and tuning. (ii) Index selection: Diverse and volatile workloads from different applications complicate the selection of performance-enhancing indexes. We developed an efficient and scalable index selection approach that accounts for index interaction and reconfiguration costs while outperforming the runtime of state-of-the-art algorithms. (iii) Cost estimation: knowledge of query costs is crucial to determine efficient query execution plans. Self-managing systems must assess and quantify the cost impact of options available to them to be able to select the most beneficial one. We generate cost estimations with high accuracy by training estimation models continuously on actual runtime observations.

Our contributions pave the way for self-managing database systems by providing solutions for core challenges in this field. The aforementioned techniques are implemented in the research database system Hyrise.

Teaching

Develop your own Database (Summer 2021)
Research and Implementation of Database Concepts (Winter 2020/21)
Trends and Concepts in the Software Industry I (Summer 2020)
Bachelor's Project: "Autopilot ON: A Cockpit for Self-Driving Databases" (Summer 2020)
Develop your own Database (Winter 2019/20)
Bachelor's Project: "Autopilot ON: A Cockpit for Self-Driving Databases" (Winter 2019/20)
Trends and Concepts in the Software Industry I (Summer 2019)
Master's Project: Parallelization and Query Plan Optimizations for the TPC-DS benchmark (Summer 2019)
Develop your own Database (Winter 2018/19)
Trends and Concepts in the Software Industry I (Summer 2018)
Master’s Project: C++ Low-Level Performance Engineering for Database Systems (Summer 2018)
Build your own Database (Winter 2017/18)
Master's Project: Query Plan Optimizations for In-Memory Databases (Summer 2017)
Trends and Concepts in the Software Industry I (Summer 2017)
Bachelor's Project HP: Machine Learning for the Intelligent Enterprise (Summer 2017)
Build your own Database (Winter 2016/17)
Bachelor's Project HP: Machine Learning for the Intelligent Enterprise (Winter 2016/17)

Selected Talks & Presentations

Workload-driven, Lazy Discovery of Data Dependencies for Query Optimization, CIDR 2022, January 2022, Santa Cruz, USA
Data Dependencies for Query Optimization: a Survey, VLDB 2021 (VLDB Journal Poster Session), August 2021, Online
A Cockpit for the Development and Evaluation of Autonomous Database Systems, ICDE 2021, April 2021, Online
Magic mirror in my hand, which is the best in the land? An Experimental Evaluation of Index Selection Algorithms, VLDB 2020, September 2020, Online
Learned Operator Cost Models, AIDB @ VLDB 2019, August 2019, Los Angeles, USA
Efficient Scalable Multi-Attribute Index Selection Using Recursive Strategies, ICDE 2019, April 2019, Macao SAR, China
Self-Driving: From General Purpose to Specialized DBMSs, VLDB 2018, August 2018, Rio de Janeiro, Brazil

Supervised Master's Theses

Current

Evaluating data dependency-based query optimization techniques

Completed

Partial Indexes in Horizontally Partitioned In-Memory Databases
Automatic Clustering in Hyrise
Utilizing Segment and Chunk Access Metrics for Data Placement
Evaluation of Index Selection Algorithms
Learned Cost Models for Query Optimization
Cardinality Estimation and Access Avoidance in Horizontally Partitioned IMDBs
Adaptive Query Optimization for In-Memory Databases
Probabilistic Data Structures for In-Memory Databases
Just-in-Time Compilation for Efficient Query Plan Execution of OLAP Workloads in Column Stores
Heterogenous Index Distribution in Multi-Node In-Memory Database Systems
Building an SQL Interface and Leveraging Query Plan Caching for a Relational Database

Publications

2024

Halfpap, S., Kossmann, J., Schlosser, R., Markl, V.: Looking Deeply into the Magic Mirror: An Interactive Analysis of Database Index Selection Approaches. VLDB 2024, PVLDB 17 (12), accepted (2024).

[ BibTeX ]

2022

Kossmann, J., Lindner, D., Naumann, F., Papenbrock, T.: Workload-driven, Lazy Discovery of Data Dependencies for Query Optimization. Proceedings of the Conference on Innovative Data Systems Research (CIDR) (2022).

[ BibTeX ] [ URL ]

Kossmann, J., Kastius, A., Schlosser, R.: SWIRL: Selection of Workload-aware Indexes using Reinforcement Learning. 25th International Conference on Extending Database Technology (EDBT 2022). pp. 155–168 (2022).

[ BibTeX ]

2021

Kossmann, J., Papenbrock, T., Naumann, F.: Data dependencies for query optimization: a survey. VLDB Journal. (2021).

[ BibTeX ]

Lindner, D., Loeser, A., Kossmann, J.: Learned What-If Cost Models for Autonomous Clustering. New Trends in Database and Information Systems - ADBIS 2021 Short Papers, Doctoral Consortium and Workshops, Tartu, Estonia. pp. 3–13 (2021).

[ BibTeX ] [ DOI ] [ Download ]

Kossmann, J., Boissier, M., Dubrawski, A., Heseding, F., Mandel, C., Pigorsch, U., Schneider, M., Schniese, T., Sobhani, M., Tsayun, P., Wille, K., Perscheid, M., Uflacker, M., Plattner, H.: A Cockpit for the Development and Evaluation of Autonomous Database Systems. 37th IEEE International Conference on Data Engineering, ICDE. pp. 2685–2688 (2021).

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

@inproceedings{kossmann2021cockpit,
  abstract = {Databases are highly optimized complex systems with a multitude of configuration options. Especially in cloud scenarios with thousands of database deployments, determining optimized database configurations in an automated fashion is of increasing importance for database providers. At the same time, due to increased system complexity, it becomes more challenging to identify well-performing configurations. Therefore, research interest in autonomous or self-driving database systems has increased enormously in recent years. Such systems promise both performance improvements and cost reductions. In the literature, various fully or partially autonomous optimization mechanisms exist that optimize single aspects, e.g., index selection. However, database administrators and developers often distrust autonomous approaches, and there is a lack of practical experimentation opportunities that could create a better understanding. Moreover, the interplay of different autonomous mechanisms under complex workloads remains an open question. The presented cockpit enables an interactive assessment of the impact of autonomous components for database systems by comparing (autonomous) systems with different configurations side by side. Thereby, the cockpit enables users to build trust in autonomous solutions by experimenting with such technologies and observing their effects in practice.},
  author = {Kossmann, Jan and Boissier, Martin and Dubrawski, Alexander and Heseding, Fabian and Mandel, Caterina and Pigorsch, Udo and Schneider, Max and Schniese, Til and Sobhani, Mona and Tsayun, Petr and Wille, Katharina and Perscheid, Michael and Uflacker, Matthias and Plattner, Hasso},
  booktitle = {37th IEEE International Conference on Data Engineering, ICDE},
  keywords = {in-memory_database myown database self-managing autonomous mboissierselected self-driving adm hyrise},
  pages = {2685-2688},
  title = {A Cockpit for the Development and Evaluation of Autonomous Database Systems},
  year = 2021
}

2020

Kossmann, J., Halfpap, S., Jankrift, M., Schlosser, R.: Magic mirror in my hand, which is the best in the land? An Experimental Evaluation of Index Selection Algorithms. Proceedings of the VLDB Endowment. pp. 2382–2395 (2020).

[ BibTeX ] [ URL ] [ Download ]

Kossmann, J., Schlosser, R.: Self-driving database systems: a conceptual approach. Distributed and Parallel Databases. 38 (4), 795–817 (2020).

[ BibTeX ] [ URL ]

2019

Dreseler, M., Kossmann, J., Boissier, M., Klauck, S., Uflacker, M., Plattner, H.: Hyrise Re-engineered: An Extensible Database System for Research in Relational In-Memory Data Management. 22nd International Conference on Extending Database Technology (EDBT). pp. 313–324 (2019).

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

@inproceedings{dreseler2018,
  abstract = {Research in data management profits when the performance evaluation is based not only on individual components in isolation, but uses an actual DBMS end-to-end. Facilitating the integration and benchmarking of new concepts within a DBMS requires a simple setup process, well-documented code, and the possibility to execute both standard and custom benchmarks without tedious preparation. Fulfilling these requirements also makes it easy to reproduce the results later on. The relational open-source database Hyrise (VLDB, 2010) was presented to make the case for hybrid row- and column-format data storage. Since then, it has evolved from being a single- purpose research DBMS towards becoming a platform for various projects, including research in the areas of indexing, data partitioning, and non-volatile memory. With a growing diversity of topics, we have found that the original code base grew to a point where new experimentation became unnecessarily difficult. Over the last two years, we have re-written Hyrise from scratch and built an extensible multi-purpose research DBMS that can serve as an easy-to-extend platform for a variety of experiments and prototyping in database research. In this paper, we discuss how our learnings from the previous version of Hyrise have influenced our re-write. We describe the new architecture of Hyrise and highlight the main components. Afterwards, we show how our extensible plugin architecture facilitates research on diverse DBMS-related aspects without compromising the architectural tidiness of the code. In a first performance evaluation, we show that the execution time of most TPC-H queries is competitive to that of other research databases.},
  author = {Dreseler, Markus and Kossmann, Jan and Boissier, Martin and Klauck, Stefan and Uflacker, Matthias and Plattner, Hasso},
  booktitle = {22nd International Conference on Extending Database Technology (EDBT)},
  keywords = {myown mboissierselected adm hyrise},
  month = 3,
  pages = {313-324},
  title = {Hyrise Re-engineered: An Extensible Database System for Research in Relational In-Memory Data Management},
  year = 2019
}

Kossmann, J., Schlosser, R.: A Framework for Self-Managing Database Systems. 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW). pp. 100–106 (2019).

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Schlosser, R., Kossmann, J., Boissier, M.: Efficient Scalable Multi-Attribute Index Selection Using Recursive Strategies. 35th IEEE International Conference on Data Engineering, ICDE. pp. 1238–1249. IEEE (2019).

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

2018

Kossmann, J.: Self-Driving: From General Purpose to Specialized DBMSs. Proceedings of the VLDB 2018 PhD Workshop co-located with the 44th International Conference on Very Large Databases (VLDB 2018), Rio de Janeiro, Brasil, Aug 27-31, 2018 (2018).

[ Abstract ] [ BibTeX ] [ URL ]

Dreseler, M., Kossmann, J., Frohnhofen, J., Uflacker, M., Plattner, H.: Fused Table Scans: Combining AVX-512 and JIT to Double the Performance of Multi-Predicate Scans. Joint Workshop of HardBD (International Workshop on Big Data Management on Emerging Hardware) and Active (Workshop on Data Management on Virtualized Active Systems), in conjunction with ICDE (2018).

[ BibTeX ] [ DOI ] [ Download ]

Kossmann, J., Dreseler, M., Gasda, T., Uflacker, M., Plattner, H.: Visual Evaluation of SQL Plan Cache Algorithms. Australasian Database Conference (ADC) (2018).

[ BibTeX ] [ DOI ] [ Download ]

Dreseler, M., Gasda, T., Kossmann, J., Uflacker, M., Plattner, H.: Adaptive Access Path Selection for Hardware-Accelerated DRAM Loads. Australasian Database Conference (ADC) (2018).

[ BibTeX ] [ DOI ] [ Download ]

2015

Mueller, S., Fritzsche, M., Kossmann, J., Schneider, M., Striebel, J., Baudisch, P.: Scotty: Relocating Physical Objects Across Distances Using Destructive Scanning, Encryption, and 3D Printing. TEI ’15 Proceedings of the Ninth International Conference on Tangible, Embedded, and Embodied Interaction. pp. 233–240 (2015).

[ BibTeX ] [ DOI ]

Schwalb, D., Kossmann, J., Faust, M., Klauck, S., Uflacker, M., Plattner, H.: Hyrise-R: Scale-out and Hot-Standby through Lazy Master Replication for Enterprise Applications. Proceedings of the 3rd VLDB Workshop on In-Memory Data Mangement and Analytics (IMDM), in conjunction with VLDB 2015 Kohala Coast, Hawaii (2015).

[ Abstract ] [ BibTeX ] [ URL ]

Dr. Jan Kossmann

Research

Unsupervised Database Optimization: Efficient Index Selection & Data Dependency-driven Query Optimization

Research Abstract

Teaching

Selected Talks & Presentations

Supervised Master's Theses

Publications

News

22.09.2023 | Trends and Concepts in the Softwareindustry Seminar offered in WiSe 2023/2024

22.05.2023 | Christopher Hagedorn Successfully Defended His PhD Thesis

03.03.2023 | Last Trends and Concepts course of Prof. Hasso Plattner

01.03.2023 | Jan Kossmann Successfully Defended His PhD Thesis

26.02.2023 | Paper on Data Tiering in Hyrise Published in BTW Proceedings

24.02.2023 | Paper on EPIC Research Group Published in SIGMOD Record

30.11.2022 | Paper on Database Optimizations for Spatio-Temporal Data published in PVLDB

04.10.2022 | Günter Hesse Successfully Defended His PhD Thesis

08.07.2022 | Successful PhD Defense by Markus Dreseler

Literature

Contact