Prof. Dr. Felix Naumann

Email Corpus Exploration

Every day, companies accumulate large amounts of heterogeneous data in the form of emails, documents like contracts and letters, or others. From text documents and their metadata, graphs can be extracted, where the nodes and edges are enriched with additional information. The resulting semi-structured data can be analysed by combining methods from the research areas of Text Mining and Graph Analysis.

Using metadata from a collection of emails, a social network over time can be constructed. Additional analysis of the content of respective communication can yield answers to the question: “who knew what when”. Other methods calculate the information diffusion in the social network.

This information and other inherent structures extracted from heterogeneous graphs can support the work of journalists, auditors, or special investigators. Instead of having to read thousands of documents to get an initial sense about a dataset, extracted information can be aggregated into a simplified overview and enable more focused investigations.

Beacon in the Dark

Developed as part of a student project "Leuchtturm im Datennebel"/"Beacon in the Dark" by Jakob Edding, Moritz Hartmann, Jonas Hering,Dennis Kipping, Hendrik Schmidt, Nico Scordialo, and others.

Together, we published a demo paper at CIKM 2018 with was mentioned as runner up for the best demo award. Below you'll find additional resources related to this project, including the source code on GitHub. The system has two parts: a processing pipeline for data ingestion and mining and a frontend. They work independently of each other. Since all databases in sum are 12GB, we do not publish them directly on the website. For a demonstration of the Beacon in the Dark we refer to the posters and demo videos. If you need help running the pipeline or are intested in the databases, please contact Tim Repke.




  • 1.
    Repke, T., Krestel, R., Edding, J., Hartmann, M., Hering, J., Kipping, D., Schmidt, H., Scordialo, N., Zenner, A.: Beacon in the Dark: A System for Interactive Exploration of Large Email Corpora. Proceedings of the International Conference on Information and Knowledge Management (CIKM). pp. 1–4. ACM (2018).