Given a relational dataset and its metadata, our objective is to monitor insert, update, and delete operations on the dataset in order to update the metadata accordingly. The metadata-updates need to be fast enough to cope with possibly high change rates of the data. While incremental metadata updates are an algorithmic challenge for every type of metadata, we shall focus on functional dependencies (FDs). The project consists of the following subgoals:
- Literature research: Review different profiling algorithms from previous research and consider their suitability for making them incremental.
- Algorithm development: Develop a novel incremental FD profiling algorithm that in- cludes finding appropriate index structures and clever look-up strategies.
- Evaluation: Evaluate the correctness and efficiency, i.e., throughput, of our solution on different real-world datasets in the incremental setting.
- Presentations: In addition to regular project meetings, we will have a midterm and a final presentation to gather feedback from the research community.
- Paper Preparation: We conclude our work in a submission-ready 12-page paper, de- scribing incremental data profiling, our algorithm, and our experimental results.
With the HPI Metanome data profiling framework (www.metanome.de), we have access to many existing profiling algorithms and can probably reuse previous work for our new task. Ultimately, we aim to publish our results at a major scientific conference.