Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Sebastian Kruse

Research Assistant at Information Systems Group

Contact

Hasso-Plattner-Institut für Softwaresystemtechnik
Prof.-Dr.-Helmert-Straße 2-3
D-14482 Potsdam, Germany

Phone: ++49 331 5509 240
Fax: ++49 331 5509 287
Room: G-3.1.13, Building G, Campus III
Email: Sebastian Kruse

Research Interests

  • Data profiling
  • Distributed systems
  • Map/Reduce frameworks
  • Query optimization
  • Cross-platform/polyglot data processing

Projects

Teaching

Master's Theses

  • Estimating Metadata of Query Results using Histograms (Cathleen Ramson, 2014)
  • Quicker Ways of Doing Fewer Things: Improved Index Structures and Algorithms for Data Profiling (Jakob Zwiener, 2015)
  • Methods of Denial Constraint Discovery (Tobias Bleifuß, 2016)
  • Optimizing Cross-Platform Iterations on 
    the Rheem Platform (Jonas Kemper, ongoing)

Seminars

Master Projects

Bachelor Projects

Guest Lectures

Professional Activities

Talks

Publications

Divide & Conquer-based Inclusion Dependency Discovery

Papenbrock, Thorsten; Kruse, Sebastian; Quiane-Ruiz, Jorge-Arnulfo; Naumann, Felix in Proceedings of the VLDB Endowment 2015 .

The discovery of all inclusion dependencies (INDs) in a dataset is an important part of any data profiling effort. Apart from the detection of foreign key relationships, INDs can help to perform data integration, query optimization, integrity checking, or schema (re-)design. However, the detection of INDs gets harder as datasets become larger in terms of number of tuples as well as attributes. To this end, we propose BINDER, an IND detection system that is capable of detecting both unary and n-ary INDs. It is based on a divide & conquer approach, which allows to handle very large datasets – an important property on the face of the ever increasing size of today’s data. In contrast to most related works, we do not rely on existing database functionality nor assume that inspected datasets fit into main memory. This renders BINDER an efficient and scalable competitor. Our exhaustive experimental evaluation shows the high superiority of BINDER over the state-of-the-art in both unary (SPIDER) and n-ary (MIND) IND discovery. BINDER is up to 26x faster than SPIDER and more than 2500x faster than MIND.
p559-papenbrock.pdf
Further Information
Tags binder hpi inclusion_dependencies isg profiling
BibTeX