Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Dr. Thorsten Papenbrock

Senior Researcher
Head of the Distributed Computing group

Hasso-Plattner-Institut
für Softwaresystemtechnik
Prof.-Dr.-Helmert-Straße 2-3
D-14482 Potsdam
Office: F-2.04, Campus II

 

Phone: +49 331 5509 294
Email:  thorsten.papenbrock(a)hpi.de
Profiles: Xing, LinkedIn
Research: ORCID, GoogleScholar, DBLP, ResearchGate

Dissertation: Data Profiling - Efficient Discovery of Dependencies


Projects

Metanome

Research Interests

Technology Interests

  • Data flow engines

  • Message passing systems

  • Parallel hardware toolkits

Teaching

Lectures:

  • Distributed Data Management (2018)
  • Distributed Data Analytics (2017)
  • Data Profiling (2017)
  • Information Integration (2015)
  • Data Profiling and Data Cleansing (2014)
  • Database Systems I (2013, 2014, 2015, 2016, 2017)
  • Database Systems II (2013)

Seminars:

  • Reliable Distributed Systems Engineering (2019)
  • Mining Streaming Data (2019)
  • Actor Database Systems (2018)
  • Proseminar Information Systems (2014)
  • Advanced Data Profiling (2013, 2017)

Bachelor Projects:

  • Data Refinery - Scalable Offer Processing with Apache Spark (2015/2016)

Master Projects:

  • Profiling Dynamic Data - Maintaining Matadata under Inserts, Updates, and Deletes (2016)
  • Approximate Data Profiling - Efficient Discovery of approximate INDs and FDs (2015)
  • Metadata Trawling - Interpreting Data Profiling Results (2014)
  • Joint Data Profiling - Holistic Discovery of INDs, FDs, and UCCs (2013)

Master Thesis:

  • Distributed Unique Column Combination Discovery (Benjamin Feldmann, 2019)
  • Reactive Inclusion Dependency Discovery (Frederic Schneider, 2019)
  • Inclusion Dependency Discovery on Streaming Data (Alexander Preuss, 2019)
  • Generating Data for Functional Dependency Profiling (Jennifer Stamm, 2018)
  • Efficient Detection of Genuine Approximate Functional Dependencies (Moritz Finke, 2018)
  • Efficient Discovery of Matching Dependencies (Philipp Schirmer, 2017)
  • Discovering Interesting Conditional Functional Dependencies (Maximilian Grundke, 2017)
  • Multivalued Dependency Detection (Tim Draeger, 2016)
  • Spinning a Web of Tables through Inclusion Dependencies (Fabian Tschirschnitz, 2014)
  • Discovery of Conditional Unique Column Combination (Jens Ehrlich, 2014)
  • Discovering Matching Dependencies (Andrina Mascher, 2013)

Online Courses:

  • Datenmanagement mit SQL (openHPI, 2013)

Publications

  • DynFD: Functional Depende... - Download
    Schirmer, P., Papenbrock, T., Kruse, S., Naumann, F., Hempfing, D., Mayer, T., Neuschäfer-Rube, D.: DynFD: Functional Dependency Discovery in Dynamic Datasets. Proceedings of the International Conference on Extending Database Technology (EDBT). p. 253--264 (2019).
     
  • Abedjan, Z., Golab, L., Naumann, F., Papenbrock, T.: Data Profiling. Morgan & Claypool Publishers (2018).
     
  • Detecting Inclusion Depen... - Download
    Tschirschnitz, F., Papenbrock, T., Naumann, F.: Detecting Inclusion Dependencies on Very Many Tables. ACM Transactions on Database Systems (TODS). 42, 18:1-18:29 (2017).
     
  • A Hybrid Approach for Eff... - Download
    Papenbrock, T., Naumann, F.: A Hybrid Approach for Efficient Unique Column Combination Discovery. Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW). pp. 195-204 (2017).
     
  • Data-driven Schema Normal... - Download
    Papenbrock, T., Naumann, F.: Data-driven Schema Normalization. Proceedings of the International Conference on Extending Database Technology (EDBT). pp. 342-353 (2017).
     
  • Fast Approximate Discover... - Download
    Kruse, S., Papenbrock, T., Dullweber, C., Finke, M., Hegner, M., Zabel, M., Zöllner, C., Naumann, F.: Fast Approximate Discovery of Inclusion Dependencies. Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW). pp. 207-226 (2017).
     
  • Holistic Data Profiling: ... - Download
    Ehrlich, J., Roick, M., Schulze, L., Zwiener, J., Papenbrock, T., Naumann, F.: Holistic Data Profiling: Simultaneous Discovery of Various Metadata. Proceedings of the International Conference on Extending Database Technology (EDBT). pp. 305-316. OpenProceedings.org (2016).
     
  • Approximate Discovery of ... - Download
    Bleifuß, T., Bülow, S., Frohnhofen, J., Risch, J., Wiese, G., Kruse, S., Papenbrock, T., Naumann, F.: Approximate Discovery of Functional Dependencies for Large Datasets. Proceedings of the International Conference on Information and Knowledge Management (CIKM). pp. 1803-1812. ACM, New York, NY, USA (2016).
     
  • A Hybrid Approach to Func... - Download
    Papenbrock, T., Naumann, F.: A Hybrid Approach to Functional Dependency Discovery. Proceedings of the International Conference on Management of Data (SIGMOD). pp. 821-833. ACM, New York, NY, USA (2016).
     
  • Data Anamnesis: Admitting... - Download
    Kruse, S., Papenbrock, T., Harmouch, H., Naumann, F.: Data Anamnesis: Admitting Raw Data into an Organization. IEEE Data Engineering Bulletin. 39, 8-20 (2016).
     
  • RDFind: Scalable Conditio... - Download
    Kruse, S., Jentzsch, A., Papenbrock, T., Kaoudi, Z., Quiane-Ruiz, J.-A., Naumann, F.: RDFind: Scalable Conditional Inclusion Dependency Discovery in RDF Datasets. Proceedings of the International Conference on Management of Data (SIGMOD). pp. 953-967. ACM, New York, NY, USA (2016).
     
  • Scaling Out the Discovery... - Download
    Kruse, S., Papenbrock, T., Naumann, F.: Scaling Out the Discovery of Inclusion Dependencies. Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW). pp. 445-454 (2015).
     
  • Functional Dependency Dis... - Download
    Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J.-P., Schönberg, M., Zwiener, J., Naumann, F.: Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms. Proceedings of the VLDB Endowment. 8, 1082-1093 (2015).
     
  • Data Profiling with Metan... - Download
    Papenbrock, T., Bergmann, T., Finke, M., Zwiener, J., Naumann, F.: Data Profiling with Metanome. Proceedings of the VLDB Endowment. 8, 1860-1871 (2015).
     
  • Divide & Conquer-based In... - Download
    Papenbrock, T., Kruse, S., Quiane-Ruiz, J.-A., Naumann, F.: Divide & Conquer-based Inclusion Dependency Discovery. Proceedings of the VLDB Endowmen. 8, 774-785 (2015).
     
  • Progressive Duplicate Det... - Download
    Papenbrock, T., Heise, A., Naumann, F.: Progressive Duplicate Detection. IEEE Transactions on Knowledge and Data Engineering (TKDE). 27, 1316-1329 (2015).
     
  • Naumann, F., Jenders, M., Papenbrock, T.: Ein Datenbankkurs mit 6000 Teilnehmern - Erfahrungen auf der openHPI MOOC Plattform. Informatik-Spektrum. 37, 333-340 (2013).
     
  • Duplicate Detection on GP... - Download
    Forchhammer, B., Papenbrock, T., Stening, T., Viehmeier, S., Draisbach, U., Naumann, F.: Duplicate Detection on GPUs. Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW). pp. 165-184 (2013).
     
  • Black Swan: Augmenting St... - Download
    Lorey, J., Naumann, F., Forchhammer, B., Mascher, A., Retzlaff, P., ZamaniFarahani, A., Discher, S., Faehnrich, C., Lemme, S., Papenbrock, T., Peschel, R.C., Richter, S., Stening, T., Viehmeier, S.: Black Swan: Augmenting Statistics with Event Data. Proceedings of the 20th Conference on Information and Knowledge Management (CIKM). pp. 2517-2520. , Glasgow, UK (2011).