Bringing Back Structure to Free Text Email Conversations with Recurrent Neural Networks

Project Description

Email communication plays an integral part of everybody's life nowadays. Especially for business emails, extracting and analysing these communication networks can reveal interesting patterns of processes and decision making within a company. Fraud detection is another application area where precise detection of communication networks is essential. In this paper we present an approach based on recurrent neural networks to untangle email threads originating from forward and reply behaviour. We further classify parts of emails into 2 or 5 zones to capture not only header and body information but also greetings and signatures.

We use the model presented in our ECIR paper in QuaggaLib. This library parses the raw email body into separate blocks and extracts meta-data from inline-headers. This kind of pre-processing should be used in all applications using email data. The library provides the actual written text content as well as the meta-data that would otherwise be hidden in the unstructured email body.

Reference

If you use our data or find this work related to yours, please cite us as...

Repke, T., Krestel, R.: Extraction and Representation of Financial Entities from Text. In: Consoli, S., Reforgiato Recupero, D., en Saisana, M. (reds.) Data Science for Economics and Finance. bll. 241–263. Springer, Cham (2021).

[ Abstract ] [ BibTeX ] [ URL ] [ DOI ] [ Download ]

Schwanhold, R., Repke, T., Krestel, R.: Modeling the Evolution of Word Senses with Force-Directed Layouts of Co-occurrence Networks. Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change (LChange@ACL 2021). 1–6 (2021).

[ Abstract ] [ BibTeX ] [ Download ]

Risch, J., Repke, T., Kohlmeyer, L., Krestel, R.: ComEx: Comment Exploration on Online News Platforms. Joint Proceedings of the ACM IUI 2021 Workshops co-located with the 26th ACM Conference on Intelligent User Interfaces (IUI). bll. 1–7. CEUR-WS.org (2021).

[ Abstract ] [ BibTeX ] [ URL ] [ Download ]

Repke, T., Krestel, R.: Visualising Large Document Collections by Jointly Modeling Text and Network Structure. Proceedings of the Joint Conference on Digital Libraries (JCDL). (2020).

[ Abstract ] [ BibTeX ] [ Download ]

Repke, T., Krestel, R.: Bringing Back Structure to Free Text Email Conversations with Recurrent Neural Networks. 40th European Conference on Information Retrieval (ECIR 2018). Springer, Grenoble, France (2018).

[ Abstract ] [ BibTeX ] [ URL ] [ Download ]

Implementations

Production-ready email parsing:
- https://github.com/HPI-Information-Systems/QuaggaLib
Reference implementation as used in the paper including competitor approaches and data:
- https://github.com/HPI-Information-Systems/Quagga

Datasets

On this page we provide datasets used in our ECIR 2018 paper and a fully parsed Enron corpus. Data was manually annotated using our Enno tool.

newly collected ASF email corpus, annotated by email zones only
selection of Enron corpus, annotated by email zones only
selection of Enron corpus, detailled annotation (including names, aliases, metadata)
automatically split, normalised, and cleaned Enron corpus as graph

Apache Software Foundation Emails (ASF)

Annotated Enron Emails

Fully Parsed Enron Graph

Related Work

Original Code for Jangada, Carvalho, 2004
More infos and data for Jangada (600+ annotated mails in 20 newsgroup dataset)
MinorThird Library used by Jangada
400 annotated emails by Lampert et. al (Enron data)
Zebra System for email zoning
Another implementation of Zebra
Talon is an awesome universal tool for everything that has to do with email structure

Bringing Back Structure to Free Text Email Conversations with Recurrent Neural Networks

Project Description

Reference

Implementations

Datasets

Apache Software Foundation Emails (ASF)

Sampling

Format

Data

Annotated Enron Emails

Sampling

Format

Data

Fully Parsed Enron Graph

Related Work

Chair

News

06.09.2024 | Congratulations Dr. Phillip Wenig

06.09.2024 | Congratulations Dr. Mazhar Hameed!

16.07.2024 | Congratulations Dr. Leon Bornemann-Paulus!

23.05.2024 | Paper accepted at NLDB 2024

29.04.2024 | Paper accepted at ITISE 2024

Project highlights

People and open positions