Hasso-Plattner-Institut
Prof. Dr. h.c. mult. Hasso Plattner
 

Dr. Jan Kossmann

 Phone:+49 (331) 5509-1323
 Email:jan.kossmann(at)hpi.de
 Address:August-Bebel-Str. 88, 14482 Potsdam
 Room:V-2.02 (Campus II)
 Links:dblp, Google ScholarLinkedIn

 

Research Area: Autonomous Data Management

Research

Unsupervised Database Optimization: Efficient Index Selection & Data Dependency-driven Query Optimization

My research focuses on Unsupervised Database Optimization.
I investigate how automated index selection and data dependencies can improve workload processing
so that database systems run more efficiently without costly manual optimizations.

Research Abstract

The performance of a database system depends on its configuration. Modern database systems offer many inter-dependent configuration options to allow the processing of variable workloads from different domains and running on heterogeneous hardware. The amount of possible configurations increases exponentially with the available options. Thus, the - already expensive - configuration process surpasses the capabilities of human database administrators. To tackle this issue, self-managing database systems utilize workload-driven optimization and machine learning techniques to configure database systems.

We focus our work on three specific self-managing database challenges: (i) system integration, (ii) index selection, and (iii) cost estimation. (i) System integration: DBMSs were not designed with self-managing capabilities in mind. We propose a generalized framework that provides facilities to enable self-managing DBMS by providing components for workload monitoring, forecasting, and tuning. (ii) Index selection: Diverse and volatile workloads from different applications complicate the selection of performance-enhancing indexes. We developed an efficient and scalable index selection approach that accounts for index interaction and reconfiguration costs while outperforming the runtime of state-of-the-art algorithms. (iii) Cost estimation: knowledge of query costs is crucial to determine efficient query execution plans. Self-managing systems must assess and quantify the cost impact of options available to them to be able to select the most beneficial one. We generate cost estimations with high accuracy by training estimation models continuously on actual runtime observations.

Our contributions pave the way for self-managing database systems by providing solutions for core challenges in this field. The aforementioned techniques are implemented in the research database system Hyrise.

Selected Talks & Presentations

  • Workload-driven, Lazy Discovery of Data Dependencies for Query Optimization, CIDR 2022, January 2022, Santa Cruz, USA
  • Data Dependencies for Query Optimization: a SurveyVLDB 2021 (VLDB Journal Poster Session), August 2021, Online 
  • A Cockpit for the Development and Evaluation of Autonomous Database Systems, ICDE 2021, April 2021, Online
  • Magic mirror in my hand, which is the best in the land? An Experimental Evaluation of Index Selection Algorithms, VLDB 2020, September 2020, Online 
  • Learned Operator Cost Models, AIDB @ VLDB 2019, August 2019, Los Angeles, USA
  • Efficient Scalable Multi-Attribute Index Selection Using Recursive Strategies, ICDE 2019, April 2019, Macao SAR, China
  • Self-Driving: From General Purpose to Specialized DBMSs, VLDB 2018, August 2018, Rio de Janeiro, Brazil

Supervised Master's Theses

Current

  • Evaluating data dependency-based query optimization techniques

Completed

  • Partial Indexes in Horizontally Partitioned In-Memory Databases
  • Automatic Clustering in Hyrise
  • Utilizing Segment and Chunk Access Metrics for Data Placement
  • Evaluation of Index Selection Algorithms
  • Learned Cost Models for Query Optimization
  • Cardinality Estimation and Access Avoidance in Horizontally Partitioned IMDBs
  • Adaptive Query Optimization for In-Memory Databases
  • Probabilistic Data Structures for In-Memory Databases
  • Just-in-Time Compilation for Efficient Query Plan Execution of OLAP Workloads in Column Stores
  • Heterogenous Index Distribution in Multi-Node In-Memory Database Systems
  • Building an SQL Interface and Leveraging Query Plan Caching for a Relational Database

Publications

2022

  • 1.
    Kossmann, J., Kastius, A., Schlosser, R.: SWIRL: Selection of Workload-aware Indexes using Reinforcement Learning. 25th International Conference on Extending Database Technology (EDBT 2022). pp. 155–168 (2022).
     
  • 2.
    Kossmann, J., Lindner, D., Naumann, F., Papenbrock, T.: Workload-driven, Lazy Discovery of Data Dependencies for Query Optimization. Proceedings of the Conference on Innovative Data Systems Research (CIDR) (2022).
     

2021

  • 1.
    Kossmann, J., Papenbrock, T., Naumann, F.: Data dependencies for query optimization: a survey. VLDB Journal. (2021).
     
  • Learned What-If Cost Mode... - Download
    2.
    Lindner, D., Loeser, A., Kossmann, J.: Learned What-If Cost Models for Autonomous Clustering. New Trends in Database and Information Systems - ADBIS 2021 Short Papers, Doctoral Consortium and Workshops, Tartu, Estonia. pp. 3–13 (2021).
     
  • A Cockpit for the Develop... - Download
    3.
    Kossmann, J., Boissier, M., Dubrawski, A., Heseding, F., Mandel, C., Pigorsch, U., Schneider, M., Schniese, T., Sobhani, M., Tsayun, P., Wille, K., Perscheid, M., Uflacker, M., Plattner, H.: A Cockpit for the Development and Evaluation of Autonomous Database Systems. 37th IEEE International Conference on Data Engineering, ICDE. pp. 2685–2688 (2021).
     

2020

  • Magic mirror in my hand, ... - Download
    1.
    Kossmann, J., Halfpap, S., Jankrift, M., Schlosser, R.: Magic mirror in my hand, which is the best in the land? An Experimental Evaluation of Index Selection Algorithms. Proceedings of the VLDB Endowment. pp. 2382–2395 (2020).
     
  • 2.
    Kossmann, J., Schlosser, R.: Self-driving database systems: a conceptual approach. Distributed and Parallel Databases. 38 (4), 795–817 (2020).
     

2019

  • A Framework for Self-Mana... - Download
    1.
    Kossmann, J., Schlosser, R.: A Framework for Self-Managing Database Systems. 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW). pp. 100–106 (2019).
     
  • Efficient Scalable Multi-... - Download
    2.
    Schlosser, R., Kossmann, J., Boissier, M.: Efficient Scalable Multi-Attribute Index Selection Using Recursive Strategies. IEEE 35th International Conference on Data Engineering (ICDE 2019). pp. 1238–1249. IEEE (2019).
     
  • Hyrise Re-engineered: An ... - Download
    3.
    Dreseler, M., Kossmann, J., Boissier, M., Klauck, S., Uflacker, M., Plattner, H.: Hyrise Re-engineered: An Extensible Database System for Research in Relational In-Memory Data Management. 22nd International Conference on Extending Database Technology (EDBT). pp. 313–324 (2019).
     

2018

  • 1.
    Kossmann, J.: Self-Driving: From General Purpose to Specialized DBMSs. Proceedings of the VLDB 2018 PhD Workshop co-located with the 44th International Conference on Very Large Databases (VLDB 2018), Rio de Janeiro, Brasil, Aug 27-31, 2018 (2018).
     
  • Fused Table Scans: Combin... - Download
    2.
    Dreseler, M., Kossmann, J., Frohnhofen, J., Uflacker, M., Plattner, H.: Fused Table Scans: Combining AVX-512 and JIT to Double the Performance of Multi-Predicate Scans. Joint Workshop of HardBD (International Workshop on Big Data Management on Emerging Hardware) and Active (Workshop on Data Management on Virtualized Active Systems), in conjunction with ICDE (2018).
     
  • Visual Evaluation of SQL ... - Download
    3.
    Kossmann, J., Dreseler, M., Gasda, T., Uflacker, M., Plattner, H.: Visual Evaluation of SQL Plan Cache Algorithms. Australasian Database Conference (ADC) (2018).
     
  • Adaptive Access Path Sele... - Download
    4.
    Dreseler, M., Gasda, T., Kossmann, J., Uflacker, M., Plattner, H.: Adaptive Access Path Selection for Hardware-Accelerated DRAM Loads. Australasian Database Conference (ADC) (2018).
     

2015

  • 1.
    Mueller, S., Fritzsche, M., Kossmann, J., Schneider, M., Striebel, J., Baudisch, P.: Scotty: Relocating Physical Objects Across Distances Using Destructive Scanning, Encryption, and 3D Printing. TEI ’15 Proceedings of the Ninth International Conference on Tangible, Embedded, and Embodied Interaction. pp. 233–240 (2015).
     
  • 2.
    Schwalb, D., Kossmann, J., Faust, M., Klauck, S., Uflacker, M., Plattner, H.: Hyrise-R: Scale-out and Hot-Standby through Lazy Master Replication for Enterprise Applications. Proceedings of the 3rd VLDB Workshop on In-Memory Data Mangement and Analytics (IMDM), in conjunction with VLDB 2015 Kohala Coast, Hawaii (2015).