COLT
What is COLT?
Rules, as created by systems such as AMIE or RUDIK, are useful for detecting and curating errors in knowledge bases. However, many knowledge bases are created automatically resp. semi-automatically and often contain incorrect entries. When these knowledge bases are then used to automatically derive logical rules, the data quality of the underlying knowledge base also affects the quality of the generated rules. This raises the following question:
How can we and be confident that a rule derived from an imperfect knowledge base is actually good?
Our COLT approach aims to answer this very question. With COLT, we present an approach that leverages deep kernel learning to estimate both the confidence as well as the quality of a rule in terms of its impact on the facts contained in a knowledge base. To estimate the true confidence of a rule, COLT requires only a few user interactions.
Key Contributions
- We propose Colt, a framework to assess quality and confidence of a rule by using expert-validated facts
- We enable the conditional application of rules and compute their confidence
- We establish a connection between our problem, the weighted-coverage problem, and quality-preserving Gaussian processes
- We show how our interactive learning approach only 20 user interactions, halves the error in confidence obtained with rule learning systems
- We publish our dataset consisting of 26 rules with more than 23,000 annotated facts.
Talk at WWW 2021
Publication
- Michael Loster, Davide Mottin, Paolo Papotti, Felix Naumann, Jan Ehmueller, Benjamin Feldmann. "Few-Shot Knowledge Validation using Rules". In Proceedings of the Web Conference (WWW), 3314-3324, 2021
Dataset
As part of this project, we provide the largest dataset of manually annotated rules at the time of publication:
Team
Active:
- Michael Loster (HPI)
- Davide Mottin (Aarhus University)
- Paolo Papotti (EURECOM)
- Felix Naumann (HPI)
- Jan Ehmüller (HPI)
- Benjamin Feldmann (HPI)
Contact
For questions or feedback, please contact Michael Loster.