An Architecture for Finding Entities on the Web
Abstract—Recent progress in research fields such as Information
Extraction and Information Retrieval enables the
creation of systems providing better search experiences to web
users. For example, systems that retrieve entities instead of
just documents have been built. In this paper we present an
approach for large-scale Entity Retrieval using web collections
as underlying corpus. We propose an architecture for entity
extraction and entity ranking starting from web documents.
This is obtained (1) using an existing web document index and
(2) creating an entity centric index. We describe advantages
and feasibility of our approach using state-of-the-art tools.