Jain, N., Bartz, C., Bredow, T., Metzenthin, E., Otholt, J., Krestel, R.: Semantic Analysis of Cultural Heritage Data: Aligning Paintings and Descriptions in Art-Historic Collections. (2020).
Art-historic documents often contain multimodal data in terms of images of artworks and metadata, descriptions, or interpretations thereof. Most research efforts have focused either on image analysis or text analysis independently since the associations between the two modes are usually lost during digitization. In this work, we focus on the task of alignment of images and textual descriptions in art-historic digital collections. To this end, we reproduce an existing approach that learns alignments in a semi-supervised fashion. We identify several challenges while automatically aligning images and texts, specifically for the cultural heritage domain, which limit the scalability of previous works. To improve the performance of alignment, we introduce various enhancements to extend the existing approach that show promising results.
Jain, N., Krestel, R.: Learning Fine-Grained Semantics for Multi-Relational Data.International Semantic Web Conference, 2020 Posters and Demos (2020).
The semantics of relations play a central role in the understanding and analysis of multi-relational data. Real-world relational datasets represented by knowledge graphs often contain polysemous relations between different types of entities, that represent multiple semantics. In this work, we present a data-driven method that can automatically discover the distinct semantics associated with high-level relations and derive an optimal number of sub-relations having fine-grained meaning. To this end, we perform clustering over vector representations of entities and relations obtained from knowledge graph embedding models.
Jain, N.: Domain-Specific Knowledge Graph Construction for Semantic Analysis.Extended Semantic Web Conference (ESWC 2020) Ph.D. Symposium (2020).
Knowledge graphs are widely used for systematic representation of real world data. They serve as a backbone for a number of applications such as search, questions answering and recommendations. Large scale, general purpose knowledge graphs, having millions of facts, have been constructed through automated techniques from publicly available datasets such as Wikipedia. However, these knowledge graphs are typically incomplete and often fail to correctly capture the semantics of the data. This holds true particularly for domain-specific data, where the generic techniques for automated knowledge graph creation often fail due to novel challenges, such as lack of training data, semantic ambiguities and absence of representative ontologies. In this thesis, we focus on automated knowledge graph constriction for the cultural heritage domain. We investigate the research challenges encountered during the creation of an ontology and a knowledge graph from digitized collections of cultural heritage data based on machine learning approaches. We identify the specific research problems for this task and present our methodology and approach for a solution along with preliminary results.
Jain, N., Bartz, C., Krestel, R.: Automatic Matching of Paintings and Descriptions in Art-Historic Archives using Multimodal Analysis.1st International Workshop on Artificial Intelligence for Historical Image Enrichment and Access (AI4HI-2020), co-located with LREC 2020 conference (2020).
Cultural heritage data plays a pivotal role in the understanding of human history and culture. A wealth of information is buried in art-historic archives which can be extracted via their digitization and analysis. This information can facilitate search and browsing, help art historians to track the provenance of artworks and enable wider semantic text exploration for digital cultural resources. However, this information is contained in images of artworks as well as textual descriptions, or annotations accompanied with the images. During the digitization of such resources, the valuable associations between the images and texts are frequently lost. In this project description, we propose an approach to retrieve the associations between images and texts for artworks from art-historic archives. To this end, we use machine learning to generate text descriptions for the extracted images on the one hand, and to detect descriptive phrases and titles of images from the text on the other hand. Finally, we use embeddings to align both, the descriptions and the images.
Razniewski, S., Jain, N., Mirza, P., Weikum, G.: Coverage of Information Extraction from Sentences and Paragraphs.Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (2019).
Scalar implicatures are language features that imply the negation of stronger statements, e.g., “She was married twice” typically implicates that she was not married thrice. In this paper we discuss the importance of scalar implicatures in the context of textual information extraction. We investigate how textual features can be used to predict whether a given text segment mentions all objects standing in a certain relationship with a certain subject. Preliminary results on Wikipedia indicate that this prediction is feasible, and yields informative assessments.
Jain, N., Krestel, R.: Who is Mona L.? Identifying Mentions of Artworks in Historical Archives.International Conference on Theory and Practice of Digital Libraries (TPDL 2019). p. 115--122. Springer (2019).
Named entity recognition (NER) plays an important role in many information retrieval tasks, including automatic knowledge graph construction. Most NER systems are typically limited to a few common named entity types, such as person, location, and organization. However, for cultural heritage resources, such as art historical archives, the recognition of titles of artworks as named entities is of high importance. In this work, we focus on identifying mentions of artworks, e.g. paintings and sculptures, from historical archives. Current state of the art NER tools are unable to adequately identify artwork titles due to the particular difficulties presented by this domain. The scarcity of training data for NER for cultural heritage poses further hindrances. To mitigate this, we propose a semi-supervised approach to create high-quality training data by leveraging existing cultural heritage resources. Our experimental evaluation shows significant improvement in NER performance for artwork titles as compared to baseline approach.