Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

20.09.2021

Paper Accepted at WI-IAT 2021

Lasse Kohlmeyer, Tim Repke, Ralf Krestel

We are excited to announce that our paper "Novel Views on Novels: Embedding Multiple Facets of Long Texts" has been accepted for presentation at the International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT 2021).

Abstract

Novels are one of the longest document types and thus one of the most complex types of texts. Many NLP tasks utilize document embeddings as machine-understandable semantic representations of documents. However, such document embeddings are optimized for short texts, such as sentences or paragraphs. When faced with longer texts, these models either truncate the long text or split it sequentially into smaller chunks. We show that when applied to a fictional novel, these traditional document embeddings fail to capture all its facets. Complex information, such as time, place, atmosphere, style, and plot is typically not represented adequately.

To this end, we propose lib2vec which computes and combines multiple embedding vectors based on various facets. Instead of splitting the text sequentially, lib2vec splits the text semantically based on domain-specific facets. We evaluate the semantic expressiveness using human-assessed book comparisons as well as content-based information retrieval tasks. The results show that our approach outperforms state-of-the-art document embeddings for long texts.

Reference

  • Lasse Kohlmeyer, Tim Repke, Ralf Krestel: Novel Views on Novels: Embedding Multiple Facets of Long Texts. Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), 2021 (to appear)
    [Paper]