Hasso-Plattner-Institut
Prof. Dr. Tilmann Rabl
 

Database Systems Seminar (with TU Darmstadt)

The Database Systems Seminar is a joint research seminar with the Data Management Lab from TU Darmstadt led by Prof. Carsten Binnig. In our meetings we share our current research work and discuss novel ideas in the following areas:

  • Data Management on Modern Hardware
  • Stream processing
  • Interactive Data Exploration & ML
  • End-to-end machine learning
  • Natural language interfaces for databases
  • Benchmarking data processing systems
  • Trusted data management

During the seminar we also host invited talks by distinguished speakers from both academia and industry. On this page you can find the short abstracts and the recorded presentations of the talks.

Timeline

16.11.2020 Lawrence Benson (HPI) Designing Hybrid PMem-DRAM Systems
30.11.2020 Lasse Thostrup (TU Darmstadt) DPI: The Data Processing Interface for High-Speed Networks
14.12.2020 Tianzhen Wang (Simon Fraser, CA) Hiding data stalls with coroutine-oriented transaction execution
25.01.2021 Ilin Tolovski (HPI) Evaluating Parameter Server Efficiency in ML Systems
08.02.2021 Benjamin Hilprecht (TU Darmstadt) SynData - Data Completion for Relational Databases
22.02.2021 Arun Kumar (UCSD) Multi-Query Optimizations for Deep Learning Systems
     

14.12.2020 - Guest Talk: Tianzhen Wang (Simon Fraser, CA)

Hiding data stalls with coroutine-oriented transaction execution

Abstract : As the speed gap between memory and CPU continues to widen, memory accesses are becoming a major overhead in pointer-rich data structures, such as B-trees, hash tables and linked lists, which are important building blocks of database systems. Software prefetching techniques have been proposed as an effective way to hide stalls, by careful scheduling and interleaving that overlap data fetching and computation. Yet they require a vastly different multi-key interface, breaking backward compatibility. It was unclear how these techniques could be applied in a database engine, either.
In this talk, we will share our experience on adopting software prefetching using recent C ++ 20 coroutines in a full database engine. The crux is an asynchronous “coroutine-to-transaction” paradigm that takes a departure from the traditional, synchronous “thread-to-transaction” execution model. Coroutine-to-transaction reduces database kernel changes and maintains backward compatibility, while retaining the performance benefits of software prefetching. With coroutine-to-transaction, we build CoroBase, a coroutine-oriented database engine. In the context of CoroBase, we further discuss the new execution model's impact on database engine design (such as concurrency control and resource management) and highlight interesting future work.

Bio: Tianzheng Wang is an assistant professor in the School of Computing Science at Simon Fraser University (SFU) in Vancouver, Canada. He works on the boundary between software and modern hardware (in particular persistent memory, manycore processors and next-generation networks). His current research focuses on database systems and related areas that impact the design of data-intensive systems, such as operating systems, distributed systems and synchronization. Tianzheng Wang received his Ph.D. and M.Sc. in Computer Science degrees from the University of Toronto in 2017 and 2014, respectively (advised by Ryan Johnson and Angela Demke Brown). He received his B.Sc. in Computing (First Class Honours) degree from Hong Kong Polytechnic University in 2012. Prior to joining SFU, he spent one year (2017-2018) at Huawei Canada Research Centre (Toronto) as a research engineer. He received the IEEE TCSC Award for Excellence in Scalable Computing (Early Career Researchers) in 2019 for contributions to scalable data processing.

Research webpage: https://www.cs.sfu.ca/~tzwang/index.html

28.02.2021 - Guest Talk: Arun Kumar (UCSD, CA)

Multi-Query Optimizations for Deep Learning Systems

Abstract: Deep learning (DL) is growing in popularity for many advanced data analytics applications in enterprise, Web, scientific, and other domains. Naturally, resource efficiency of DL systems and the productivity of their users are pressing challenges to democratizing DL. In this talk, I present a new technical direction from my research that tackles such challenges with a database-inspired lens: higher-level specification and multi-query optimization (MQO). By exploiting higher-level abstractions of DL usage already inherent in practice, I show how we can automatically restructure the underlying execution to improve resource efficiency, reduce runtimes and costs, and in turn, improve user productivity. To this end,

Our approach benefits both DL inference and training, as I illustrate with three recent systems:  Vista , with MQO for CNN transfer learning; Krypton , with MQO for CNN inference; and  Cerebro , with MQO for parallel DL model selection. All of our techniques are easily integrated with existing DL systems (eg, TensorFlow and PyTorch) without affecting their internal code, making practical adoption easier. I will conclude by highlighting some of our ongoing and upcoming work on generalizing Cerebro to more higher-level DL tasks and to more execution environments such as cloud-native settings.

Bio: Arun Kumar is an Assistant Professor in the Department of Computer Science and Engineering and the Halicioglu Data Science Institute at the University of California, San Diego. He is a member of the Database Lab and Center for Networked Systems and an affiliate member of the AI ​​Group. His primary research interests are in data management and systems for machine learning / artificial intelligence-based data analytics. Systems and ideas based on his research have been released as part of the Apache MADlib open-source library, shipped as part of products from Cloudera, IBM, Oracle, and Pivotal, and used internally by Facebook, Google, LogicBlox, Microsoft, and other companies. He is a recipient of two SIGMOD research paper awards, a SIGMOD Research Highlight Award, three distinguished reviewer awards from SIGMOD / VLDB,

Research webpage:  https://adalabucsd.github.io/