Email communication plays an integral part of everybody's life nowadays. Especially for business emails, extracting and analysing these communication networks can reveal interesting patterns of processes and decision making within a company. Fraud detection is another application area where precise detection of communication networks is essential. In this paper we present an approach based on recurrent neural networks to untangle email threads originating from forward and reply behaviour. We further classify parts of emails into 2 or 5 zones to capture not only header and body information but also greetings and signatures.
We use the model presented in our ECIR paper in QuaggaLib. This library parses the raw email body into separate blocks and extracts meta-data from inline-headers. This kind of pre-processing should be used in all applications using email data. The library provides the actual written text content as well as the meta-data that would otherwise be hidden in the unstructured email body.