Almost all extraction tasks suffer from the same issue: the tradeoff between precision and recall. Do you want a model that will find all possible things (e.g. named entities in a text) but will also find many additional entities that are in fact no entities (or include additional tokens/characters) or should the model be very precise, meaning that the identified entities are actually entities but you miss a lot of those that could have been found.
In OpenIE tasks, usually the model is tuned for precision. In this thesis, we want to investigate active learning models to identify structural/methodic errors and clean the extracted data. The interface for the human-in-the-loop should be as simple as possible, imagine having a Tinder-like endpoint that can be used to label data on the fly everywhere you are. Usually all extraction tasks are evaluated requires human annotators to sit on a desktop computer and precicely label (e.g. highlight text that contains an entity) the data, which is a boring and painstaking task. However, if the machine suggests entities that a user can annotate while waiting in line or when bored on the train, that could significantly improve the annotation experience and therefore increase the amount of labelled data.
In this thesis, a student has to develop a model, that enables this level of annotation simplicity. The machine would have to learn from the feedback it gets and improve the extraction model. Thy hypothesis (based on experience) is, that extraction models have common errors. The "cleanup" model would have to identify rules or the structure of these errors and thus clean the extracted data.