13.12.2017 - Toni Grütze (external)
Adding Value to Text with User-Generated Content
In recent years, the ever-growing amount of documents on the Web as well as in closed systems for private or business contexts grew significantly. The research field of text mining comprises various application areas that have the goal of extracting high-quality information from textual data. Harvesting entity knowledge from these large text collections is one of the major challenges.
In this talk, we will present CohEEL, a method for linking textual mentions of entities in the documents with their representation in user-generated knowledge bases such as Wikipedia and YAGO. Solutions to this entity linking problem have typically aimed at balancing the rate of linking correctness (precision) and the linking coverage rate (recall). While entity links in texts could be used to improve various Information Retrieval tasks, such as text summarization, document classification, or topic-based clustering, the linking precision is the decisive factor.
Our algorithm CohEEL, an efficient linking method, uses a random walk strategy to combine a precision-oriented and a recall-oriented classifier in such a way that a high precision is maintained, while recall is elevated to the maximum possible level. CohEEL and further algorithms solving different text mining problems based on user-generated content are discussed in my PhD thesis.