Art archives are a rich source of information in several ways: proving the provenance of certain art pieces, facilitating research on art history, and understanding an artist in the context of their work. These archives typically comprise various kinds of heterogeneous documents: auction catalogs, personal correspondence, books, exhibition catalogs, bills, certificates, studies, theses, etc. Many of these archives are not easily accessible as they are not yet digitized. Even the ones that are available in digitized form are hard to explore with general text mining tools.
In this project, we aim to facilitate access to a large collection of art related documents. To this end, we need to adapt standard NLP tools to cater to the unique challenges of the art domain. The ultimate goal is to generate a knowledge graph which can be easily explored by art historians. The knowledge graph would also serve as a backbone for semantic search functionality and for new ways to represent art entities, e.g. as embeddings in a high dimensional space. Modern deep learning methods will be developed to manage and visualize large collections of art historical and scholarly documents.
This project is being carried out in close collaboration with the Wildenstein Plattner Institute who have been generous to share their interesting data with us for facilitating this research, while also providing regular feedback and suggestions.