Prof. Dr. Felix Naumann

Dr. Thorsten Papenbrock

Professor (at the University of Marburg)
Head of the Distributed Computing group

für Softwaresystemtechnik
Prof.-Dr.-Helmert-Straße 2-3
D-14482 Potsdam


Current affiliation: University of Marburg (Website)

Email:  thorsten.papenbrock(a)hpi.de
Profiles: Xing, LinkedIn
Research: ORCID, GoogleScholar, DBLP, ResearchGate

Dissertation: Data Profiling - Efficient Discovery of Dependencies


Research Interests

  • Complex data engineering problems
  • Parallel and distributed computing challenges
    • e.g. robustness, efficiency, and elasticity

Technology Interests

  • Data flow engines
  • Message passing systems
  • Parallel hardware toolkits




  • Sustainable Machine Learning on Edge Device Clusters (2020)
  • Machine Learning for Data Streams (2019)
  • Reliable Distributed Systems Engineering (2019)
  • Mining Streaming Data (2019)
  • Actor Database Systems (2018)
  • Proseminar Information Systems (2014)
  • Advanced Data Profiling (2013, 2017)

Bachelor Projects:

  • UltraMine - Scalable Analytics on Time Series Data (2020/2021)
  • DataRefinery - Scalable Offer Processing with Apache Spark (2015/2016)

Master Projects:

  • Profiling Dynamic Data - Maintaining Matadata under Inserts, Updates, and Deletes (2016)
  • Approximate Data Profiling - Efficient Discovery of approximate INDs and FDs (2015)
  • Metadata Trawling - Interpreting Data Profiling Results (2014)
  • Joint Data Profiling - Holistic Discovery of INDs, FDs, and UCCs (2013)

Master Thesis:

  • Distributed Duplicate Detection on Streaming Data (Jakob Köhler, 2021)
  • Distributed Graph Based Approximate Nearest Neighbor Search (Juliane Waack, 2020)
  • A2DB: A Reactive Database for Theta-Joins (Julian Weise, 2020)
  • Distributed Detection of Sequential Anomalies in Time Related Sequences (Johannes Schneider, 2020)
  • Efficient Distributed Discovery of Bidirectional Order Dependencies (Sebastian Schmidl, 2020)
  • Distributed Unique Column Combination Discovery (Benjamin Feldmann, 2019)
  • Reactive Inclusion Dependency Discovery (Frederic Schneider, 2019)
  • Inclusion Dependency Discovery on Streaming Data (Alexander Preuss, 2019)
  • Generating Data for Functional Dependency Profiling (Jennifer Stamm, 2018)
  • Efficient Detection of Genuine Approximate Functional Dependencies (Moritz Finke, 2018)
  • Efficient Discovery of Matching Dependencies (Philipp Schirmer, 2017)
  • Discovering Interesting Conditional Functional Dependencies (Maximilian Grundke, 2017)
  • Multivalued Dependency Detection (Tim Draeger, 2016)
  • Spinning a Web of Tables through Inclusion Dependencies (Fabian Tschirschnitz, 2014)
  • Discovery of Conditional Unique Column Combination (Jens Ehrlich, 2014)
  • Discovering Matching Dependencies (Andrina Mascher, 2013)

Online Courses:

  • Datenmanagement mit SQL (openHPI, 2013)


Oops, an error occurred! Code: 2024092207500794367118