Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

Dr. Thorsten Papenbrock

Professor (at the University of Marburg)
Head of the Distributed Computing group

Hasso-Plattner-Institut
für Softwaresystemtechnik
Prof.-Dr.-Helmert-Straße 2-3
D-14482 Potsdam

 

Current affiliation: University of Marburg (Website)

Email:  thorsten.papenbrock(a)hpi.de
Profiles: Xing, LinkedIn
Research: ORCID, GoogleScholar, DBLP, ResearchGate

Dissertation: Data Profiling - Efficient Discovery of Dependencies


Projects

Research Interests

  • Complex data engineering problems
  • Parallel and distributed computing challenges
    • e.g. robustness, efficiency, and elasticity

Technology Interests

  • Data flow engines
  • Message passing systems
  • Parallel hardware toolkits

Teaching

Lectures:

Seminars:

  • Sustainable Machine Learning on Edge Device Clusters (2020)
  • Machine Learning for Data Streams (2019)
  • Reliable Distributed Systems Engineering (2019)
  • Mining Streaming Data (2019)
  • Actor Database Systems (2018)
  • Proseminar Information Systems (2014)
  • Advanced Data Profiling (2013, 2017)

Bachelor Projects:

  • UltraMine - Scalable Analytics on Time Series Data (2020/2021)
  • DataRefinery - Scalable Offer Processing with Apache Spark (2015/2016)

Master Projects:

  • Profiling Dynamic Data - Maintaining Matadata under Inserts, Updates, and Deletes (2016)
  • Approximate Data Profiling - Efficient Discovery of approximate INDs and FDs (2015)
  • Metadata Trawling - Interpreting Data Profiling Results (2014)
  • Joint Data Profiling - Holistic Discovery of INDs, FDs, and UCCs (2013)

Master Thesis:

  • Distributed Duplicate Detection on Streaming Data (Jakob Köhler, 2021)
  • Distributed Graph Based Approximate Nearest Neighbor Search (Juliane Waack, 2020)
  • A2DB: A Reactive Database for Theta-Joins (Julian Weise, 2020)
  • Distributed Detection of Sequential Anomalies in Time Related Sequences (Johannes Schneider, 2020)
  • Efficient Distributed Discovery of Bidirectional Order Dependencies (Sebastian Schmidl, 2020)
  • Distributed Unique Column Combination Discovery (Benjamin Feldmann, 2019)
  • Reactive Inclusion Dependency Discovery (Frederic Schneider, 2019)
  • Inclusion Dependency Discovery on Streaming Data (Alexander Preuss, 2019)
  • Generating Data for Functional Dependency Profiling (Jennifer Stamm, 2018)
  • Efficient Detection of Genuine Approximate Functional Dependencies (Moritz Finke, 2018)
  • Efficient Discovery of Matching Dependencies (Philipp Schirmer, 2017)
  • Discovering Interesting Conditional Functional Dependencies (Maximilian Grundke, 2017)
  • Multivalued Dependency Detection (Tim Draeger, 2016)
  • Spinning a Web of Tables through Inclusion Dependencies (Fabian Tschirschnitz, 2014)
  • Discovery of Conditional Unique Column Combination (Jens Ehrlich, 2014)
  • Discovering Matching Dependencies (Andrina Mascher, 2013)

Online Courses:

  • Datenmanagement mit SQL (openHPI, 2013)

Publications

  • 1.
    Schmidl, S., Papenbrock, T.: Efficient Distributed Discovery of Bidirectional Order Dependencies. The VLDB Journal. (2021).
     
  • 2.
    Schneider, J., Wenig, P., Papenbrock, T.: Distributed detection of sequential anomalies in univariate time series. The International Journal on Very Large Data Bases. (2021).
     
  • Optimized Theta-Join Proc... - Download
    3.
    Weise, J., Schmidl, S., Papenbrock, T.: Optimized Theta-Join Processing. In: Sattler, K.-U., Herschel, M., and Lehner, W. (eds.) Proceedings of the Conference on Database Systems for Business, Technology, and Web (BTW). pp. 59–78. Gesellschaft für Informatik, Bonn (2021).
     
  • 4.
    Harmouch, H., Papenbrock, T., Naumann, F.: Relational Header Discovery using Similarity Search in a Table Corpus. IEEE International Conference on Data Engineering (ICDE). 444–455 (2021).
     
  • Data dependencies for que... - Download
    5.
    Kossmann, J., Papenbrock, T., Naumann, F.: Data dependencies for query optimization: a survey. VLDB Journal. (2021).
     
  • Efficient Discovery of Ma... - Download
    6.
    Schirmer, P., Papenbrock, T., Koumarelas, I., Naumann, F.: Efficient Discovery of Matching Dependencies. ACM Transactions on Database Systems (TODS). 45, 1–33 (2020).
     
  • MDedup: Duplicate Detecti... - Download
    7.
    Koumarelas, I., Papenbrock, T., Naumann, F.: MDedup: Duplicate Detection with Matching Dependencies. Proceedings of the VLDB Endowment (PVLDB). 13, 712–725 (2020).
     
  • Hitting Set Enumeration w... - Download
    8.
    Birnick, J., Bläsius, T., Friedrich, T., Naumann, F., Papenbrock, T., Schirneck, M.: Hitting Set Enumeration with Partial Information for Unique Column Combination Discovery. Proceedings of the VLDB Endowment. 13, 2270–2283 (2020).
     
  • An Actor Database System ... - Download
    9.
    Schmidl, S., Schneider, F., Papenbrock, T.: An Actor Database System for Akka. Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW) - Workshopband. pp. 225–234 (2019).
     
  • Inclusion Dependency Disc... - Download
    10.
    Dürsch, F., Stebner, A., Windheuser, F., Fischer, M., Friedrich, T., Strelow, N., Bleifuß, T., Harmouch, H., Jiang, L., Papenbrock, T., Naumann, F.: Inclusion Dependency Discovery: An Experimental Evaluation of Thirteen Algorithms. Proceedings of the International Conference on Information and Knowledge Management (CIKM). pp. 219–228 (2019).
     
  • DynFD: Functional Depende... - Download
    11.
    Schirmer, P., Papenbrock, T., Kruse, S., Naumann, F., Hempfing, D., Mayer, T., Neuschäfer-Rube, D.: DynFD: Functional Dependency Discovery in Dynamic Datasets. Proceedings of the International Conference on Extending Database Technology (EDBT). pp. 253–264 (2019).
     
  • 12.
    Abedjan, Z., Golab, L., Naumann, F., Papenbrock, T.: Data Profiling. Morgan & Claypool Publishers (2018).
     
  • Detecting Inclusion Depen... - Download
    13.
    Tschirschnitz, F., Papenbrock, T., Naumann, F.: Detecting Inclusion Dependencies on Very Many Tables. ACM Transactions on Database Systems (TODS). 42, 18:1–18:29 (2017).
     
  • A Hybrid Approach for Eff... - Download
    14.
    Papenbrock, T., Naumann, F.: A Hybrid Approach for Efficient Unique Column Combination Discovery. Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW). pp. 195–204 (2017).
     
  • Data-driven Schema Normal... - Download
    15.
    Papenbrock, T., Naumann, F.: Data-driven Schema Normalization. Proceedings of the International Conference on Extending Database Technology (EDBT). pp. 342–353 (2017).
     
  • Fast Approximate Discover... - Download
    16.
    Kruse, S., Papenbrock, T., Dullweber, C., Finke, M., Hegner, M., Zabel, M., Zöllner, C., Naumann, F.: Fast Approximate Discovery of Inclusion Dependencies. Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW). pp. 207–226 (2017).
     
  • Approximate Discovery of ... - Download
    17.
    Bleifuß, T., Bülow, S., Frohnhofen, J., Risch, J., Wiese, G., Kruse, S., Papenbrock, T., Naumann, F.: Approximate Discovery of Functional Dependencies for Large Datasets. Proceedings of the International Conference on Information and Knowledge Management (CIKM). pp. 1803–1812. ACM, New York, NY, USA (2016).
     
  • Holistic Data Profiling: ... - Download
    18.
    Ehrlich, J., Roick, M., Schulze, L., Zwiener, J., Papenbrock, T., Naumann, F.: Holistic Data Profiling: Simultaneous Discovery of Various Metadata. Proceedings of the International Conference on Extending Database Technology (EDBT). pp. 305–316. OpenProceedings.org (2016).
     
  • A Hybrid Approach to Func... - Download
    19.
    Papenbrock, T., Naumann, F.: A Hybrid Approach to Functional Dependency Discovery. Proceedings of the International Conference on Management of Data (SIGMOD). pp. 821–833. ACM, New York, NY, USA (2016).
     
  • Data Anamnesis: Admitting... - Download
    20.
    Kruse, S., Papenbrock, T., Harmouch, H., Naumann, F.: Data Anamnesis: Admitting Raw Data into an Organization. IEEE Data Engineering Bulletin. 39, 8–20 (2016).
     
  • RDFind: Scalable Conditio... - Download
    21.
    Kruse, S., Jentzsch, A., Papenbrock, T., Kaoudi, Z., Quiane-Ruiz, J.-A., Naumann, F.: RDFind: Scalable Conditional Inclusion Dependency Discovery in RDF Datasets. Proceedings of the International Conference on Management of Data (SIGMOD). pp. 953–967. ACM, New York, NY, USA (2016).
     
  • Functional Dependency Dis... - Download
    22.
    Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J.-P., Schönberg, M., Zwiener, J., Naumann, F.: Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms. Proceedings of the VLDB Endowment. 8, 1082–1093 (2015).
     
  • Progressive Duplicate Det... - Download
    23.
    Papenbrock, T., Heise, A., Naumann, F.: Progressive Duplicate Detection. IEEE Transactions on Knowledge and Data Engineering (TKDE). 27, 1316–1329 (2015).
     
  • Scaling Out the Discovery... - Download
    24.
    Kruse, S., Papenbrock, T., Naumann, F.: Scaling Out the Discovery of Inclusion Dependencies. Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW). pp. 445–454 (2015).
     
  • Divide & Conquer-based In... - Download
    25.
    Papenbrock, T., Kruse, S., Quiane-Ruiz, J.-A., Naumann, F.: Divide & Conquer-based Inclusion Dependency Discovery. Proceedings of the VLDB Endowmen. 8, 774–785 (2015).
     
  • Data Profiling with Metan... - Download
    26.
    Papenbrock, T., Bergmann, T., Finke, M., Zwiener, J., Naumann, F.: Data Profiling with Metanome. Proceedings of the VLDB Endowment. 8, 1860–1871 (2015).
     
  • Ein Datenbankkurs mit 600... - Download
    27.
    Naumann, F., Jenders, M., Papenbrock, T.: Ein Datenbankkurs mit 6000 Teilnehmern - Erfahrungen auf der openHPI MOOC Plattform. Informatik-Spektrum. 37, 333–340 (2013).
     
  • Duplicate Detection on GP... - Download
    28.
    Forchhammer, B., Papenbrock, T., Stening, T., Viehmeier, S., Draisbach, U., Naumann, F.: Duplicate Detection on GPUs. Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW). pp. 165–184 (2013).
     
  • Black Swan: Augmenting St... - Download
    29.
    Lorey, J., Naumann, F., Forchhammer, B., Mascher, A., Retzlaff, P., ZamaniFarahani, A., Discher, S., Faehnrich, C., Lemme, S., Papenbrock, T., Peschel, R.C., Richter, S., Stening, T., Viehmeier, S.: Black Swan: Augmenting Statistics with Event Data. Proceedings of the 20th Conference on Information and Knowledge Management (CIKM). pp. 2517–2520. , Glasgow, UK (2011).