Risch, J., Repke, T., Kohlmeyer, L., Krestel, R.: ComEx: Comment Exploration on Online News Platforms. Joint Proceedings of the ACM IUI 2021 Workshops co-located with the 26th ACM Conference on Intelligent User Interfaces (IUI). pp. 1–7. CEUR-WS.org (2021).
The comment sections of online news platforms have shaped the way in which people express their opinion online. However, due to the overwhelming number of comments, no in-depth discussions emerge. To foster more interactive and engaging discussions, we propose our ComEx interface for the exploration of reader comments on online news platforms. Potential discussion participants can get a quick overview and are not discouraged by an abundance of comments. It is our goal to represent the discussion in a graph of comments that can be used in an interactive user interface for exploration. To this end, a processing pipeline fetches comments from several different platforms and adds edges in the graph based on topical similarity or meta-data and ranks nodes on metrics such as controversy or toxicity. By interacting with the graph, users can explore and react to single comments or entire threads they are interested in.
Repke, T., Krestel, R.: Visualising Large Document Collections by Jointly Modeling Text and Network Structure. Proceedings of the Joint Conference on Digital Libraries (JCDL). (2020).
Many large text collections exhibit graph structures, either inherent to the content itself or encoded in the metadata of the individual documents. Example graphs extracted from document collections are co-author networks, citation networks, or named-entity-cooccurrence networks. Furthermore, social networks can be extracted from email corpora, tweets, or social media. When it comes to visualising these large corpora, either the textual content or the network graph are used. In this paper, we propose to incorporate both, text and graph, to not only visualise the semantic information encoded in the documents' content but also the relationships expressed by the inherent network structure. To this end, we introduce a novel algorithm based on multi-objective optimisation to jointly position embedded documents and graph nodes in a two-dimensional landscape. We illustrate the effectiveness of our approach with real-world datasets and show that we can capture the semantics of large document collections better than other visualisations based on either the content or the network information.
Repke, T., Krestel, R.: Exploration Interface for Jointly Visualised Text and Graph Data. International Conference on Intelligent User Interfaces Companion (IUI ’20). (2020).
Many large text collections exhibit graph structures, either inherent to the content itself or encoded in the metadata of the individual documents. Example graphs extracted from document collections are co-author networks, citation networks, or named-entity-co-occurrence networks. Furthermore, social networks can be extracted from email corpora, tweets, or social media. When it comes to visualising these large corpora, traditionally either the textual content or the network graph are used. We propose to incorporate both, text and graph, to not only visualise the semantic information encoded in the documents’ content but also the relationships expressed by the inherent network structure in a two-dimensional landscape. We illustrate the effectiveness of our approach with an exploration interface for different real world datasets.
Kellermeier, T., Repke, T., Krestel, R.: Mining Business Relationships from Stocks and News. MIDAS@ECML-PKDD. (2019).
In today’s modern society and global economy, decision making processes are increasingly supported by data. Especially in financial businesses it is essential to know about how the players in our global or national market are connected. In this work we compare different approaches for creating company relationship graphs. In our evaluation we see similarities in relationships extracted from Bloomberg and Reuters business news and correlations in historic stock market data.
Loster, M., Repke, T., Krestel, R., Naumann, F., Ehmueller, J., Feldmann, B., Maspfuhl, O.: The Challenges of Creating, Maintaining and Exploring Graphs of Financial Entities. Proceedings of the Fourth International Workshop on Data Science for Macro-Modeling (DSMM 2018). ACM (2018).
The integration of a wide range of structured and unstructured information sources into a uniformly integrated knowledge base is an important task in the ﬁnancial sector. As an example, modern risk analysis methods can beneﬁt greatly from an integrated knowledge base, building in particular a dedicated, domain-speciﬁc knowledge graph. Knowledge graphs can be used to gain a holistic view of the current economic situation so that systemic risks can be identiﬁed early enough to react appropriately. The use of this graphical structure thus allows the investigation of many ﬁnancial scenarios, such as the impact of corporate bankruptcy on other market participants within the network. In this particular scenario, the links between the individual market participants can be used to determine which companies are aﬀected by a bankruptcy and to what extent. We took these considerations as a motivation to start the development of a system capable of constructing and maintaining a knowledge graph of ﬁnancial entities and their relationships. The envisioned system generates this particular graph by extracting and combining information from both structured data sources such as Wikidata and DBpedia, as well as from unstructured data sources such as newspaper articles and ﬁnancial ﬁlings. In addition, the system should incorporate proprietary data sources, such as ﬁnancial transactions (structured) and credit reports (unstructured). The ultimate goal is to create a system that recognizes ﬁnancial entities in structured and unstructured sources, links them with the information of a knowledge base, and then extracts the relations expressed in the text between the identiﬁed entities. The constructed knowledge base can be used to construct the desired knowledge graph. Our system design consists of several components, each of which addresses a speciﬁc subproblem. To this end, Figure 1 gives a general overview of our system and its subcomponents.
Repke, T., Krestel, R., Edding, J., Hartmann, M., Hering, J., Kipping, D., Schmidt, H., Scordialo, N., Zenner, A.: Beacon in the Dark: A System for Interactive Exploration of Large Email Corpora. Proceedings of the International Conference on Information and Knowledge Management (CIKM). pp. 1–4. ACM (2018).
Emails play a major role in today's business communication, documenting not only work but also decision making processes. The large amount of heterogeneous data in these email corpora renders manual investigations by experts infeasible. Auditors or jornalists, e.g., who are looking for irregular or inappropriate content or suspicous patterns, are in desperate need for computer-aided exploration tools to support their investigations. We present our Beacon system for the exploration of such corpora at different levels of detail. A distributed processing pipeline combines text mining methods and social network analysis to augment the already semi-structured nature of emails. The user interface ties into the resulting cleaned and enriched dataset. For the interface design we identify three objectives expert users have: gain an initial overview of the data to identify leads to investigate, understand the context of the information at hand, and have meaningful filters to iteratively focus onto a subset of emails. To this end we make use of interactive visualisations for rearranging and aggregating the extracted information to reveal salient patterns.
Repke, T., Krestel, R.: Topic-aware Network Visualisation to Explore Large Email Corpora. International Workshop on Big Data Visual Exploration and Analytics (BigVis). (2018).
Nowadays, more and more large datasets exhibit an intrinsic graph structure. While there exist special graph databases to handle ever increasing amounts of nodes and edges, visualising this data becomes infeasible quickly with growing data. In addition, looking at its structure is not sufficient to get an overview of a graph dataset. Indeed, visualising additional information about nodes or edges without cluttering the screen is essential. In this paper, we propose an interactive visualisation for social networks that positions individuals (nodes) on a two-dimensional canvas such that communities defined by social links (edges) are easily recognisable. Furthermore, we visualise topical relatedness between individuals by analysing information about social links, in our case email communication. To this end, we utilise document embeddings, which project the content of an email message into a high dimensional semantic space and graph embeddings, which project nodes in a network graph into a latent space reflecting their relatedness.
Repke, T., Krestel, R.: Bringing Back Structure to Free Text Email Conversations with Recurrent Neural Networks. 40th European Conference on Information Retrieval (ECIR 2018). Springer, Grenoble, France (2018).
Email communication plays an integral part of everybody's life nowadays. Especially for business emails, extracting and analysing these communication networks can reveal interesting patterns of processes and decision making within a company. Fraud detection is another application area where precise detection of communication networks is essential. In this paper we present an approach based on recurrent neural networks to untangle email threads originating from forward and reply behaviour. We further classify parts of emails into 2 or 5 zones to capture not only header and body information but also greetings and signatures. We show that our deep learning approach outperforms state-of-the-art systems based on traditional machine learning and hand-crafted rules. Besides using the well-known Enron email corpus for our experiments, we additionally created a new annotated email benchmark corpus from Apache mailing lists.
Zuo, Z., Loster, M., Krestel, R., Naumann, F.: Uncovering Business Relationships: Context-sensitive Relationship Extraction for Difficult Relationship Types. Proceedings of the Conference "Lernen, Wissen, Daten, Analysen" (LWDA) (2017).
This paper establishes a semi-supervised strategy for extracting various types of complex business relationships from textual data by using only a few manually provided company seed pairs that exemplify the target relationship. Additionally, we offer a solution for determining the direction of asymmetric relationships, such as “ownership of”. We improve the reliability of the extraction process by using a holistic pattern identification method that classifies the generated extraction patterns. Our experiments show that we can accurately and reliably extract new entity pairs occurring in the target relationship by using as few as five labeled seed pairs.
Repke, T., Loster, M., Krestel, R.: Comparing Features for Ranking Relationships Between Financial Entities Based on Text. Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets. pp. 12:1–12:2. ACM, New York, NY, USA (2017).
Evaluating the credibility of a company is an important and complex task for financial experts. When estimating the risk associated with a potential asset, analysts rely on large amounts of data from a variety of different sources, such as newspapers, stock market trends, and bank statements. Finding relevant information, such as relationships between financial entities, in mostly unstructured data is a tedious task and examining all sources by hand quickly becomes infeasible. In this paper, we propose an approach to rank extracted relationships based on text snippets, such that important information can be displayed more prominently. Our experiments with different numerical representations of text have shown, that ensemble of methods performs best on labelled data provided for the FEIII Challenge 2017.