Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
 

07.08.2018

Two Demo Papers Accepted at CIKM 2018

Our two demo papers

  • "CurEx – A System for Extracting, Curating, and Exploring Domain-Specific Knowledge Graphs from Text" by Michael Loster, Felix Naumann, Jan Ehmueller, and Benjamin Feldmann
  • "Beacon in the Dark: A System for Interactive Exploration of Large Email Corpora" by Tim Repke and Ralf Krestel together with Jakob Edding, Moritz Hartmann, Jonas Hering, Dennis Kipping, Hendrik Schmidt, Nico Scordialo, Alexander Zenner

were accepted at this year's CIKM 2018. The conference will take place from 22nd - 26th October in Turin, Italy.

Abstract for CurEx Paper

The integration of diverse structured and unstructured information sources into a unified, domain-specific knowledge base is an important task in many areas. 
A well-maintained knowledge base enables data analysis in complex scenarios, such as risk analysis in the financial sector or investigating large data leaks, such as the Paradise or Panama papers.
Both the creation of such knowledge bases, as well as their continuous maintenance and curation involves many complex tasks and considerable manual effort.
Since the integration process can involve errors, it becomes necessary for the users to manually correct the erroneous information contained in the knowledge base.

With CurEx, we present a modular system that allows structured and unstructured data sources to be integrated into a domain-specific knowledge base.
In particular, we (i) enable the incremental improvement of each individual integration component;
(ii) enable the selective generation of multiple knowledge graphs from the information contained in the knowledge base;
and (iii) provide two distinct user interfaces tailored to the needs of data engineers and end-users respectively.
The former has curation capabilities and controls the integration process, whereas the latter focuses on the exploration of the generated knowledge graph.

Abstract for Beacon Paper

Emails play a major role in today's business communication, documenting not only work but also decision making processes. 
The large amount of heterogeneous data in these email corpora renders manual investigations by experts infeasible. 
Auditors or jornalists, e.g., who are looking for irregular or inappropriate content or suspicous patterns, are in desperate need for computer-aided exploration tools to support their investigations. 

We present our Beacon system for the exploration of such corpora at different levels of detail. 
A distributed processing pipeline combines text mining methods and social network analysis to augment the already semi-structured nature of emails. 
The user interface ties into the resulting cleaned and enriched dataset. 
For the interface design we identify three objectives expert users have: gain an initial overview of the data to identify leads to investigate, understand the context of the information at hand, and have meaningful filters to iteratively focus onto a subset of emails. 
To this end we make use of interactive visualisations for rearranging and aggregating the extracted information to reveal salient patterns.