The algorithms that we will look at in this seminar are the following:
Bell&Brockhausen: Bell, Siegfried, and Peter Brockhausen. 1995. “Discovery of Data Dependencies in Relational Databases.” Statistics, Machine Learning and Knowledge Discovery in Databases, ML–Net Familiarization Workshop, 53–58.
Zigzag: Marchi, Fabien De, and Jean-Marc Petit. 2003. “Zigzag: A New Algorithm for Mining Large Inclusion Dependencies in Databases.” In Proceedings of the International Conference on Data Mining (ICDM), 27–34.
FIND2: Koeller, Andreas, and Elke. A. Rundensteiner. 2003. “Discovery of High-Dimensional Inclusion Dependencies.” In Proceedings of the International Conference on Data Engineering (ICDE), 683–685.
SPIDER: Bauckmann, Jana, Ulf Leser, Felix Naumann, and Veronique Tietz. 2007. “Efficiently Detecting Inclusion Dependencies.” In Proceedings of the International Conference on Data Engineering (ICDE), 1448–1450.
deMarchi/MIND: Marchi, Fabien De, Stéphane Lopes, and Jean Marc Petit. 2009. “Unary and N-Ary Inclusion Dependency Discovery in Relational Databases.” Journal of Intelligent Information Systems 32 (1): 53–73.
BINDER: Papenbrock, Thorsten, Sebastian Kruse, Jorge-Arnulfo Quiané-Ruiz, and Felix Naumann. 2015. “Divide & Conquer-Based Inclusion Dependency Discovery.” In Proceedings of the VLDB Endowment, 8:774–785.
S-INDD: Shaabani, Nuhad, and Christoph Meinel. 2015. “Scalable Inclusion Dependency Discovery.” In Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA), 425–440.
SINDY: Kruse, Sebastian, Thorsten Papenbrock, and Felix Naumann. 2015. “Scaling Out the Discovery of Inclusion Dependencies.” In Proceedings of the Conference Database Systems for Business, Technology and Web (BTW), 445–454.
MIND2: Shaabani, Nuhad, and Christoph Meinel. 2016. “Detecting Maximum Inclusion Dependencies without Candidate Generation.” Proceedings of the Conference International Conference on Database and Expert (DEXA), 118–133.
MANY: Tschirschnitz, Fabian, Thorsten Papenbrock, and Felix Naumann. 2017. “Detecting Inclusion Dependencies on Very Many Tables.” ACM Transactions on Database Systems (TODS) 1 (1): 1–30.
Approximate/Incremental/Partial discovery algorithms:
- Dasu, Tamraparni, Theodore Johnson, S. Muthukrishnan, and Vladislav Shkapenyuk. 2002. “Mining Database Structure; Or, How to Build a Data Quality Browser.” In Proceedings of the International Conference on Management of Data (SIGMOD), 240–251.
Zhang, Meihui, Marios Hadjieleftheriou, Beng Chin Ooi, Cecilia M. Procopiuc, and Divesh Srivastava. 2010. “On Multi-Column Foreign Key Discovery.” Proceedings of the VLDB Endowment 3 (1–2): 805–814.
- Shaabani, Nuhad, and Christoph Meinel. 2017. “Incremental Discovery of Inclusion Dependencies.” In Proceedings of the International Conference on Scientific and Statistical Database Management (SSDBM), 1–12.
- Papenbrock, Thorsten, Christian Dullweber, Moritz Finke, Sebastian Kruse, Manuel Hegner, Martin Zabel, Christian Zöllner, and Felix Naumann. 2017. “Fast Approximate Discovery of Inclusion Dependencies.” In Proceedings of the Conference Database Systems for Business, Technology and Web (BTW), 207–226.
Other IND-related publications:
- Lopes, Stéphane, Jean-Marc Petit, and Farouk Toumani. 2002. “Discovering Interesting Inclusion Dependencies: Application to Logical Database Tuning.” Information Systems 27 (1): 1–19.
- Brown, Paul G., and Peter J. Hass. 2003. “BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data.” In Proceedings of the VLDB Endowment, 668–79. VLDB Endowment.
- Marchi, Fabien De. 2011. “CLIM: Closed Inclusion Dependency Mining in Databases.” In Proceedings of the International Conference on Data Mining Workshops (ICDMW), 1098–1103.
- Bauckmann, Jana, Ziawasch Abedjan, Ulf Leser, Heiko Müller, and Felix Naumann. 2012. “Discovering Conditional Inclusion Dependencies.” In Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2094–2098.
See the following two articles for an overview on data profiling: