Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Email Corpus Exploration

Every day, companies accumulate large amounts of heterogeneous data in the form of emails, documents like contracts and letters, or others. From text documents and their metadata, graphs can be extracted, where the nodes and edges are enriched with additional information. The resulting semi-structured data can be analysed by combining methods from the research areas of Text Mining and Graph Analysis.

Using metadata from a collection of emails, a social network over time can be constructed. Additional analysis of the content of respective communication can yield answers to the question: “who knew what when”. Other methods calculate the information diffusion in the social network.

This information and other inherent structures extracted from heterogeneous graphs can support the work of journalists, auditors, or special investigators. Instead of having to read thousands of documents to get an initial sense about a dataset, extracted information can be aggregated into a simplified overview and enable more focused investigations.

Beacon in the Dark

Developed as part of a student project "Leuchtturm im Datennebel"/"Beacon in the Dark".

coming soon:

  • Demo video
  • Posters
  • link to source code
  • (link to processed data?)

Published at CIKM 2018 (Demo)

Error in extension PUMA/BibSonomy CSL (#12)!
Could not find valid API credentials. Please check the plugin settings!