Hasso-Plattner-Institut
Prof. Dr. Felix Naumann
  
 

Mímir: Corpus Exploration and Knowledge Management

Today's business communication is almost unimaginable without emails. They document discussions and decisions or summarise face-to-face meetings in the form of unstructured text or attachments and thus hold a significant amount of information about a business. In very exceptional cases, for example when investigating a known case of fraud, specialists examine inboxes and attached files of involved personnel to determine the extent of the situation. However, the sheer quantity of documents is unmanageable without some guidance by an exploration tool, as journalists working with the Panama Papers leak experienced.

In this project, we develop and evaluate information extraction and linking methods to combine and in an exploration tool. This work touches the fields of text mining, text summarisation, document classification, topic modelling, named entity extraction, entity linking, relationship extraction, as well as social network-, and graph analysis. We work together with our industry partner from the financial sector to put our prototypes in the hands of auditors for real world feedback.

Subprojects

Project-Related Publications

  • Visualising Large Documen... - Download
    Repke, T., Krestel, R.: Visualising Large Document Collections by Jointly Modeling Text and Network Structure.Proceedings of the Joint Conference on Digital Libraries (JCDL). (2020).
     
  • Exploration Interface for... - Download
    Repke, T., Krestel, R.: Exploration Interface for Jointly Visualised Text and Graph Data.International Conference on Intelligent User Interfaces Companion (IUI '20). (2020).
     
  • Mining Business Relations... - Download
    Kellermeier, T., Repke, T., Krestel, R.: Mining Business Relationships from Stocks and News.MIDAS@ECML-PKDD. (2019).
     
  • The Challenges of Creatin... - Download
    Loster, M., Repke, T., Krestel, R., Naumann, F., Ehmueller, J., Feldmann, B., Maspfuhl, O.: The Challenges of Creating, Maintaining and Exploring Graphs of Financial Entities.Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling (DSMM 2018). ACM (2018).
     
  • Beacon in the Dark: A Sys... - Download
    Repke, T., Krestel, R., Edding, J., Hartmann, M., Hering, J., Kipping, D., Schmidt, H., Scordialo, N., Zenner, A.: Beacon in the Dark: A System for Interactive Exploration of Large Email Corpora.Proceedings of the International Conference on Information and Knowledge Management (CIKM). p. 1--4. ACM (2018).
     
  • Topic-aware Network Visua... - Download
    Repke, T., Krestel, R.: Topic-aware Network Visualisation to Explore Large Email Corpora.International Workshop on Big Data Visual Exploration and Analytics (BigVis). (2018).
     
  • Bringing Back Structure t... - Download
    Repke, T., Krestel, R.: Bringing Back Structure to Free Text Email Conversations with Recurrent Neural Networks.40th European Conference on Information Retrieval (ECIR 2018). Springer, Grenoble, France (2018).
     
  • Uncovering Business Relat... - Download
    Zuo, Z., Loster, M., Krestel, R., Naumann, F.: Uncovering Business Relationships: Context-sensitive Relationship Extraction for Difficult Relationship Types.Proceedings of the Conference "Lernen, Wissen, Daten, Analysen" (LWDA) (2017).
     
  • Comparing Features for Ra... - Download
    Repke, T., Loster, M., Krestel, R.: Comparing Features for Ranking Relationships Between Financial Entities Based on Text.Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets. p. 12:1--12:2. ACM, New York, NY, USA (2017).