In our Business Communication Analysis project we aim to develop novel tools for the exploration of massive unstructured document corpora, for example emails, attachments, and news. Journalists, auditors, or special investigators are overwhelmed by the sheer amount of data they have to analyse in order to gain insights from such datasets. There are many interesting topics for theses in this area that focus on different aspects of that work that touch the research areas of text mining, text summarisation, document classification, topic modelling, named entity extraction, entity linking, relationship extraction, as well as social network-, and graph analysis. We work together with our industry partner from the financial sector to put our prototypes in the hands of auditors for real world feedback.
Please note, that everything listed below are only rough ideas. Their intention is to give you a broader sense of what could be done. See it as seed ideas for a thesis, maybe it needs more, maybe it's too complex. Feel free to read between the lines, extend ideas, go in depth or combine them if applicable and contact Tim with your questions or proposals.
Most of the following ideas are mostly part of ourMímir project.