Nitisha Jain, Ralf Krestel
Our submission "Who is Mona L.? Identifying Mentions of Artworks in Historical Archives" has been accepted as a short paper at the International Conference on Theory and Practice of Digital Libraries (TPDL), to be held from September 9-12, 2019 in Oslo, Norway.
Who is Mona L.? Identifying Mentions of Artworks in Historical Archives
Named entity recognition (NER) plays an important role in many natural language processing tasks, including automatic knowledge graph construction and ontology generation. Most NER systems are typically limited to a few common named entity types, such as person, location, and organization. However, for cultural heritage resources, such as art historical archives, the recognition of titles of artworks as named entities is of high importance. In this work, we focus on identifying mentions of artworks, e.g. paintings and sculptures, from digitized versions of art historical archives. Current state of the art NER tools are unable to adequately identify artwork titles due to the particular diculties presented by this domain. The scarcity of training data for NER for cultural heritage poses further hindrances. To mitigate this, we propose a semi-supervised approach to create high-quality training data by leveraging existing cultural heritage resources from knowledge bases such as Wikidata. Our experimental evaluation shows signicant improvement in NER performance for artwork titles as compared to baseline approaches.