Hasso-Plattner-Institut
  
Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Thorsten Papenbrock

Research Assistant, PhD Candidate

Hasso-Plattner-Institut
für Softwaresystemtechnik
Prof.-Dr.-Helmert-Straße 2-3
D-14482 Potsdam
Room: E-2-01.2

 

Phone: +49 331 5509 294
Email:  thorsten.papenbrock(a)hpi.de
Profiles: Xing
Research: GoogleScholar, DBLP, ResearchGate


Projects

Metanome

Research Interests

Data Profiling:

Solving computationally complex tasks is a challenge and a central activity in data profiling. This involves primarily the discovery of metadata in many gigabyte-sized datasets, which is why algorithms developed for this purpose need to be efficient and robust. Because data profiling offers such a plethora of challenging, yet unsolved tasks, I have chosen it as my primary research area. I am in particular interested in the discovery of data dependencies, such as inclusion dependencies, unique column combinations, functional dependencies, order dependencies, matching dependencies, and many more.

Data Cleansing:

Data is one of the most important assets in any company. Therefore, it is crucial to ensure its quality and reliability. Data cleansing and data profiling are two essential tasks that - if performed correctly and frequently - help to guarantee data fitness. In this area, I am particularly interested in (semi-)automatic duplicate detection methods and normalization techniques as well as their efficient implementation.

Parallel and Distributed Systems:

Due to the complexity of many tasks in IT, a clever algorithm alone is often not able to deliver a solution in time. In these cases, parallel and distributed systems are needed. Especially when facing ever larger datasets, i.e., big data, we need to consider technologies such as map-reduce (e.g. Spark and Flink), actors (e.g. Akka), and GPUs (e.g. CUDA and OpenCL) to implement scalability into our solutions.

Teaching

Lectures:

Seminars:

  • Advanced Data Profiling (2013)
  • Proseminar Information Systems (2014)

Bachelor Projects:

  • Data Refinery - Scalable Offer Processing with Apache Spark (2015/2016)

Master Projects:

  • Joint Data Profiling - Holistic Discovery of INDs, FDs, and UCCs (2013)
  • Metadata Trawling - Interpreting Data Profiling Results (2014)
  • Approximate Data Profiling - Efficient Discovery of approximate INDs and FDs (2015)
  • Profiling Dynamic Data - Maintaining Matadata under Inserts, Updates, and Deletes (2016)

Master Thesis:

    • Discovering Matching Dependencies (Andrina Mascher, 2013)
    • Discovery of Conditional Unique Column Combination (Jens Ehrlich, 2014)
    • Spinning a Web of Tables through Inclusion Dependencies (Fabian Tschirschnitz, 2014)
    • Multivalued Dependency Detection (Tim Dräger, 2016)

    Online Courses:

    • Datenmanagement mit SQL (openHPI, 2013)

    Publications

    1.
    Sebastian Kruse, Thorsten Papenbrock, Christian Dullweber, Moritz Finke, Manuel Hegner, Martin Zabel, Christian Zöllner, Felix Naumann
    In Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW), 2017 accepted
    2.
    Thorsten Papenbrock, Felix Naumann
    Proceedings of the International Conference on Management of Data (SIGMOD), 2016
    3.
    Sebastian Kruse, Thorsten Papenbrock, Hazar Harmouch, Felix Naumann
    IEEE Data Engineering Bulletin, vol. 39(2):8–20 6 2016
    4.
    Jens Ehrlich, Mandy Roick, Lukas Schulze, Jakob Zwiener, Thorsten Papenbrock, and Felix Naumann
    In Extending Database Technology (EDBT), pages 305-316, 2016
    5.
    Sebastian Kruse, Anja Jentzsch, Thorsten Papenbrock, Zoi Kaoudi, Jorge-Arnulfo Quiane-Ruiz, Felix Naumann
    In Proceedings of the ACM SIGMOD conference (SIGMOD), 2016
    6.
    Tobias Bleifuß, Susanne Bülow, Johannes Frohnhofen, Julian Risch, Georg Wiese, Sebastian Kruse, Thorsten Papenbrock, Felix Naumann
    In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), pages 1803-1812, 2016
    7.
    Thorsten Papenbrock, Arvid Heise, Felix Naumann
    IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 27(5):1316-1329 2015
    8.
    Thorsten Papenbrock, Sebastian Kruse, Jorge-Arnulfo Quiane-Ruiz, Felix Naumann
    Proceedings of the VLDB Endowment, vol. 8(7):774-785 2015
    9.
    Thorsten Papenbrock, Jens Ehrlich, Jannik Marten, Tommy Neubert, Jan-Peer Rudolph, Martin Schönberg, Jakob Zwiener, Felix Naumann
    Proceedings of the VLDB Endowment, vol. 8(10):1082-1093 2015
    10.
    Thorsten Papenbrock, Tanja Bergmann, Moritz Finke, Jakob Zwiener, Felix Naumann
    Proceedings of the VLDB Endowment, vol. 8(12):1860-1871 2015
    11.
    Sebastian Kruse, Thorsten Papenbrock, Felix Naumann
    In Proceedings of the conference on Database Systems for Business, Technology, and Web (BTW), 2015
    12.
    Felix Naumann, Maximilian Jenders, Thorsten Papenbrock
    Informatik-Spektrum, (12) 2013
    13.
    Benedikt Forchhammer, Thorsten Papenbrock, Thomas Stening, Sven Viehmeier, Uwe Draisbach, Felix Naumann
    In Proceedings of the 15th conference on Database Systems for Business, Technology, and Web (BTW), pages 165–184, Magdeburg, Germany, 2013 Runner Up for Best Paper Award
    14.
    Johannes Lorey, Felix Naumann, Benedikt Forchhammer, Andrina Mascher, Peter Retzlaff, Armin ZamaniFarahani, Soeren Discher, Cindy Faehnrich, Stefan Lemme, Thorsten Papenbrock, Robert Christoph Peschel, Stephan Richter, Thomas Stening, Sven Viehmeier
    In Proceedings of the 20th Conference on Information and Knowledge Management (CIKM), pages 2517-2520, Glasgow, UK, 2011