Our group includes PostDocs, PhD students, and student assistants, and is headed by Prof. Felix Naumann. If you are interested in joining our team, please contact Felix Naumann.

For bachelor students we offer German lectures on database systems in addition to paper- or project-oriented seminars. Within a one-year bachelor project, students finalize their studies in cooperation with external partners. For master students we offer courses on information integration, data profiling, and information retrieval enhanced by specialized seminars, master projects and we advise master theses.

Most of our research is conducted in the context of larger research projects, in collaboration across students, across groups, and across universities. We strive to make available most of our datasets and source code.

Please do not hesitate to reach out directly to us, if you cannot find a paper, slides, or other research artifacts.

Description

This dataset contains the annotated facts used to run the experiments presented in our paper "Few-Shot Knowledge Validation Using Rules" (WWW'21). It includes 26 annotated rules (22 postitive, 4 negative) covering 23324 triples (instances) in its entirety. Both the rule and instance data are represented in the JSON format and are contained in the rules.json and instances.json files. A brief description of the most important data fields is given below.

Rules (rules.json)

_id - is a unique ObjectID for the rule (as automatically provided by MongoDB)
rule_type - indicates whether the rule is positive or negative (true corresponds to positive, false to negative)
premise - the premise of the rule
conclusion - the conclusion of the rule
query_pattern - the query pattern used to execute the rule

Instances / Triples (instances.json)

_id - is a unique ObjectID for the instance (as automatically provided by MongoDB)
rule - is a unique ObjectID that identifies the rule that generated this instance
subj - the subject of the instance/triple
pred - the predicate of the instance/triple
obj - the object of the instance/triple
correct - a boolean value indicating whether the fact is correct or incorrect (true corresponds to correct, false to incorrect)
label - an integer value indicating whether the fact is correct or incorrect (1 corresponds to correct, 0 to incorrect)
score - was intended for future experiments and can be safely ignored.

Download

Annotated Facts (507kB)

Chair

Prof. Dr. Felix Naumann

Information Systems

E-Mail: felix.naumann(at)hpi.de

Assistant: Diana Stephan

Office: Campus II, House F, F-2.01
Tel.: +49 (0)331 5509-280
Fax: +49 (0)331 5509-287
E-Mail: office-naumann(at)hpi.de

To visit us, please see these directions.

Project highlights

Metanome: Big Data Profiling

Data Preparation

Janus: Change exploration

KITQAR: AI and Data Quality

Description

Rules (rules.json)

Instances / Triples (instances.json)

Download

Chair

News

06.10.2024 | Paper accepted at EDBT 2025

06.09.2024 | Congratulations Dr. Phillip Wenig

06.09.2024 | Congratulations Dr. Mazhar Hameed!

16.07.2024 | Congratulations Dr. Leon Bornemann-Paulus!

23.05.2024 | Paper accepted at NLDB 2024

Project highlights

People and open positions