Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Dr. Thorsten Papenbrock

Senior Researcher
Head of the Distributed Computing group

Hasso-Plattner-Institut
für Softwaresystemtechnik
Prof.-Dr.-Helmert-Straße 2-3
D-14482 Potsdam
Office: F-2.04, Campus II

 

Phone: +49 331 5509 294
Email:  thorsten.papenbrock(a)hpi.de
Profiles: Xing, LinkedIn
Research: ORCID, GoogleScholar, DBLP, ResearchGate

Dissertation: Data Profiling - Efficient Discovery of Dependencies


Projects

Metanome

Research Interests

Technology Interests

  • Data flow engines

  • Message passing systems

  • Parallel hardware toolkits

Teaching

Lectures:

  • Distributed Data Management (2018, 2019)
  • Distributed Data Analytics (2017)
  • Data Profiling (2017)
  • Information Integration (2015)
  • Data Profiling and Data Cleansing (2014)
  • Database Systems I (2013, 2014, 2015, 2016, 2017)
  • Database Systems II (2013)

Seminars:

  • Reliable Distributed Systems Engineering (2019)
  • Mining Streaming Data (2019)
  • Actor Database Systems (2018)
  • Proseminar Information Systems (2014)
  • Advanced Data Profiling (2013, 2017)

Bachelor Projects:

  • Data Refinery - Scalable Offer Processing with Apache Spark (2015/2016)

Master Projects:

  • Profiling Dynamic Data - Maintaining Matadata under Inserts, Updates, and Deletes (2016)
  • Approximate Data Profiling - Efficient Discovery of approximate INDs and FDs (2015)
  • Metadata Trawling - Interpreting Data Profiling Results (2014)
  • Joint Data Profiling - Holistic Discovery of INDs, FDs, and UCCs (2013)

Master Thesis:

  • Distributed Unique Column Combination Discovery (Benjamin Feldmann, 2019)
  • Reactive Inclusion Dependency Discovery (Frederic Schneider, 2019)
  • Inclusion Dependency Discovery on Streaming Data (Alexander Preuss, 2019)
  • Generating Data for Functional Dependency Profiling (Jennifer Stamm, 2018)
  • Efficient Detection of Genuine Approximate Functional Dependencies (Moritz Finke, 2018)
  • Efficient Discovery of Matching Dependencies (Philipp Schirmer, 2017)
  • Discovering Interesting Conditional Functional Dependencies (Maximilian Grundke, 2017)
  • Multivalued Dependency Detection (Tim Draeger, 2016)
  • Spinning a Web of Tables through Inclusion Dependencies (Fabian Tschirschnitz, 2014)
  • Discovery of Conditional Unique Column Combination (Jens Ehrlich, 2014)
  • Discovering Matching Dependencies (Andrina Mascher, 2013)

Online Courses:

  • Datenmanagement mit SQL (openHPI, 2013)

Publications

DynFD: Functional Dependency Discovery in Dynamic Datasets

Schirmer, Philipp; Papenbrock, Thorsten; Kruse, Sebastian; Naumann, Felix; Hempfing, Dennis; Mayer, Torben; Neuschäfer-Rube, Daniel in Proceedings of the International Conference on Extending Database Technology (EDBT) page 253--264 . 2019 .

Functional dependencies (FDs) support various tasks for the management of relational data, such as schema normalization, data cleaning, and query optimization. However, while existing FD discovery algorithms regard only static datasets, many real-world datasets are constantly changing – and with them their FDs. Unfortunately, the computational hardness of FD discovery prohibits a continuous re-execution of those existing algorithms with every change of the data. To this end, we propose DynFD, the first algorithm to dis cover and maintain functional dependencies in dynamic datasets. Whenever the inspected dataset changes, DynFD evolves its FDs rather than recalculating them. For this to work efficiently, we propose indexed data structures along with novel and efficient update operations. Our experiments compare DynFD’s incremental mode of operation to the repeated re-execution of existing, static algorithms. They show that DynFD can maintain the FDs of dynamic datasets over an order of magnitude faster than its static counter-parts.
[ URL ] [ DOI ]
DynFD: Functional Depende... - Download
Further Information
Tags dynamic  functional_dependencies  hpi  isg  profiling