Mímir: Corpus Exploration and Knowledge Management

Today's business communication is almost unimaginable without emails. They document discussions and decisions or summarise face-to-face meetings in the form of unstructured text or attachments and thus hold a significant amount of information about a business. In very exceptional cases, for example when investigating a known case of fraud, specialists examine inboxes and attached files of involved personnel to determine the extent of the situation. However, the sheer quantity of documents is unmanageable without some guidance by an exploration tool, as journalists working with the Panama Papers leak experienced.

In this project, we develop and evaluate information extraction and linking methods to combine and in an exploration tool. This work touches the fields of text mining, text summarisation, document classification, topic modelling, named entity extraction, entity linking, relationship extraction, as well as social network-, and graph analysis. We work together with our industry partner from the financial sector to put our prototypes in the hands of auditors for real world feedback.

Subprojects

Joint Visualisation of Network and Text Data (MODiR)
Comment Exploration
Mapping and Understanding the Evolution of Language
Extracting hidden information from email structure
Email Corpus Exploration (Bachelor's Project, 2017/2018)
What's in a life (Master project, Summer 2019)
Your Master's Thesis?

Project-Related Publications

Risch, J., Repke, T., Kohlmeyer, L., Krestel, R.: ComEx: Comment Exploration on Online News Platforms. Joint Proceedings of the ACM IUI 2021 Workshops co-located with the 26th ACM Conference on Intelligent User Interfaces (IUI). pp. 1–7. CEUR-WS.org (2021).

[ Abstract ] [ BibTeX ] [ URL ] [ Download ]

Repke, T., Krestel, R.: Visualising Large Document Collections by Jointly Modeling Text and Network Structure. Proceedings of the Joint Conference on Digital Libraries (JCDL). (2020).

[ Abstract ] [ BibTeX ] [ Download ]

Repke, T., Krestel, R.: Exploration Interface for Jointly Visualised Text and Graph Data. International Conference on Intelligent User Interfaces Companion (IUI ’20). (2020).

[ Abstract ] [ BibTeX ] [ Download ]

Kellermeier, T., Repke, T., Krestel, R.: Mining Business Relationships from Stocks and News. MIDAS@ECML-PKDD. (2019).

[ Abstract ] [ BibTeX ] [ Download ]

Loster, M., Repke, T., Krestel, R., Naumann, F., Ehmueller, J., Feldmann, B., Maspfuhl, O.: The Challenges of Creating, Maintaining and Exploring Graphs of Financial Entities. Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling (DSMM 2018). ACM (2018).

[ Abstract ] [ BibTeX ] [ URL ] [ Download ]

The integration of a wide range of structured and unstructured information sources into a uniformly integrated knowledge base is an important task in the ﬁnancial sector. As an example, modern risk analysis methods can beneﬁt greatly from an integrated knowledge base, building in particular a dedicated, domain-speciﬁc knowledge graph. Knowledge graphs can be used to gain a holistic view of the current economic situation so that systemic risks can be identiﬁed early enough to react appropriately. The use of this graphical structure thus allows the investigation of many ﬁnancial scenarios, such as the impact of corporate bankruptcy on other market participants within the network. In this particular scenario, the links between the individual market participants can be used to determine which companies are aﬀected by a bankruptcy and to what extent. We took these considerations as a motivation to start the development of a system capable of constructing and maintaining a knowledge graph of ﬁnancial entities and their relationships. The envisioned system generates this particular graph by extracting and combining information from both structured data sources such as Wikidata and DBpedia, as well as from unstructured data sources such as newspaper articles and ﬁnancial ﬁlings. In addition, the system should incorporate proprietary data sources, such as ﬁnancial transactions (structured) and credit reports (unstructured). The ultimate goal is to create a system that recognizes ﬁnancial entities in structured and unstructured sources, links them with the information of a knowledge base, and then extracts the relations expressed in the text between the identiﬁed entities. The constructed knowledge base can be used to construct the desired knowledge graph. Our system design consists of several components, each of which addresses a speciﬁc subproblem. To this end, Figure 1 gives a general overview of our system and its subcomponents.

@inproceedings{loster2018challenges,
  abstract = {The integration of a wide range of structured and unstructured information sources into a uniformly integrated knowledge base is an important task in the ﬁnancial sector. As an example, modern risk analysis methods can beneﬁt greatly from an integrated knowledge base, building in particular a dedicated, domain-speciﬁc knowledge graph. Knowledge graphs can be used to gain a holistic view of the current economic situation so that systemic risks can be identiﬁed early enough to react appropriately. The use of this graphical structure thus allows the investigation of many ﬁnancial scenarios, such as the impact of corporate bankruptcy on other market participants within the network. In this particular scenario, the links between the individual market participants can be used to determine which companies are aﬀected by a bankruptcy and to what extent. We took these considerations as a motivation to start the development of a system capable of constructing and maintaining a knowledge graph of ﬁnancial entities and their relationships. The envisioned system generates this particular graph by extracting and combining information from both structured data sources such as Wikidata and DBpedia, as well as from unstructured data sources such as newspaper articles and ﬁnancial ﬁlings. In addition, the system should incorporate proprietary data sources, such as ﬁnancial transactions (structured) and credit reports (unstructured). The ultimate goal is to create a system that recognizes ﬁnancial entities in structured and unstructured sources, links them with the information of a knowledge base, and then extracts the relations expressed in the text between the identiﬁed entities. The constructed knowledge base can be used to construct the desired knowledge graph. Our system design consists of several components, each of which addresses a speciﬁc subproblem. To this end, Figure 1 gives a general overview of our system and its subcomponents.},
  author = {Loster, Michael and Repke, Tim and Krestel, Ralf and Naumann, Felix and Ehmueller, Jan and Feldmann, Benjamin and Maspfuhl, Oliver},
  booktitle = {Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling (DSMM 2018)},
  keywords = {hpi business_communication entities explore web_science financial maintain isg graph create},
  month = {June},
  publisher = {ACM},
  title = {The Challenges of Creating, Maintaining and Exploring Graphs of Financial Entities},
  year = 2018
}

Repke, T., Krestel, R., Edding, J., Hartmann, M., Hering, J., Kipping, D., Schmidt, H., Scordialo, N., Zenner, A.: Beacon in the Dark: A System for Interactive Exploration of Large Email Corpora. Proceedings of the International Conference on Information and Knowledge Management (CIKM). pp. 1–4. ACM (2018).

[ Abstract ] [ BibTeX ] [ Download ]

Repke, T., Krestel, R.: Topic-aware Network Visualisation to Explore Large Email Corpora. International Workshop on Big Data Visual Exploration and Analytics (BigVis). (2018).

[ Abstract ] [ BibTeX ] [ Download ]

Repke, T., Krestel, R.: Bringing Back Structure to Free Text Email Conversations with Recurrent Neural Networks. 40th European Conference on Information Retrieval (ECIR 2018). Springer, Grenoble, France (2018).

[ Abstract ] [ BibTeX ] [ URL ] [ Download ]

Zuo, Z., Loster, M., Krestel, R., Naumann, F.: Uncovering Business Relationships: Context-sensitive Relationship Extraction for Difficult Relationship Types. Proceedings of the Conference "Lernen, Wissen, Daten, Analysen" (LWDA) (2017).

[ Abstract ] [ BibTeX ] [ Download ]

10.

Repke, T., Loster, M., Krestel, R.: Comparing Features for Ranking Relationships Between Financial Entities Based on Text. Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets. pp. 12:1–12:2. ACM, New York, NY, USA (2017).

[ Abstract ] [ BibTeX ] [ URL ] [ Download ]

Mímir: Corpus Exploration and Knowledge Management

Subprojects

Project-Related Publications

Chair

News

06.10.2024 | Paper accepted at EDBT 2025

06.09.2024 | Congratulations Dr. Phillip Wenig

06.09.2024 | Congratulations Dr. Mazhar Hameed!

16.07.2024 | Congratulations Dr. Leon Bornemann-Paulus!

23.05.2024 | Paper accepted at NLDB 2024

Project highlights

People and open positions