Every day, companies accumulate large amounts of heterogeneous data in the form of emails, documents like contracts and letters, or others. From text documents and their metadata, graphs can be extracted, where the nodes and edges are enriched with additional information. The resulting semi-structured data can be analysed by combining methods from the research areas of Text Mining and Graph Analysis.
Using metadata from a collection of emails, a social network over time can be constructed. Additional analysis of the content of respective communication can yield answers to the question: “who knew what when”. Other methods calculate the information diffusion in the social network.
This information and other inherent structures extracted from heterogeneous graphs can support the work of journalists, auditors, or special investigators. Instead of having to read thousands of documents to get an initial sense about a dataset, extracted information can be aggregated into a simplified overview and enable more focused investigations.