Hasso-Plattner-Institut
 
    • de
Hasso-Plattner-Institut
Prof. Dr. Gerard de Melo
 

Open Research Topics

The following is a list of projects we currently work on. Feel free to reach us for possible collaborations or thesis proposals. Of course, we are always open to hear any additional idea or topic.

 

NLP and Large Language Models


Multilingual Reference Retrieval for Wikipedia

Description: ­­­­

References on Wikipedia are crucial to ensure the quality of an article. Therefore, it is important to support editors to retrieve references in their language. Currently, the focus in the literature has been on English Wikipedia. In other language Wikipedias, we have less insight into which references are currently used, and how to retrieve accurate references for a piece of information on the respective language Wikipedia. Therefore, it is necessary to understand the current state of multilingual references, what references are trustworthy across languages, and how to automatically retrieve and score new references.

Literature:

Contacts: 


 

Enabling Long Contexts with Smart Prompting in Large Language Models

Description: 

Transformer-based Large Language Models (LLMs) are limited in their context (the range in which the model is able to meaningfully connect pieces of information) due to the underlying attention mechanism. While many approaches have tackled the attention mechanism itself, this project aims to investigate context encoding schemes that can be added to a prompt, e.g. via adding a generated summary of previous interactions or via adding encoded context as soft prompts.

Literature:

Contacts: 


 

Anything Related to Modeling Long Context/Long Documents

Description: 

Whether you want to dive deep into the inner workings of the attention mechanism, explore external memory, have an idea for a hierarchical model or want to include context through prompts, contact me and we will work out your/an idea.

Contacts: 


 

What changes when finetuning LLMs?

Description: 

Large language models like Chatgpt work well for various NLP tasks and languages. Still for a particular domain or language, finetuning LLMs with domain/language specific data can future improve the performance of LLMs. As LLMs have a large number of parameters, probing how LLMs changed during finetuning is not a straight-forward task. There are a number of related work as shown below and we aim to investigate further on this matter and come up with more effective and efficient finetuning methods for LLMs.

Literature:

Contacts: 


 

Zero-shot/few-shot NLP

Description: 

Natural language processing (NLP) includes various tasks such as text classification, summarization, translation and so on. Most of the tasks have very little training data or no training data at all. And it is expensive to create training data manually. Therefore we study zero-shot/few-shot methods for NLP tasks with no/little training data.

Literature:

Contacts: 

 

Vision-and-Language Models and Multi-modal Learning


Visually-Grounded Reasoning

Description: 

Pure language models still struggle with some real-world reasoning tasks. Humans often tackle these tasks by imagining a scene/situation and reasoning over these visualizations. In this topic, we want to use image/scene generation models as an imagination module and fuse their information into a language model. We hypothesize that the better interpolation capabilities of image generation models will coherently fill in missing/implicit information in the visual domain that can then also be accessed by the language model and lead to better reasoning performance.

Some work was already done and can be built on. Still a lot to explore!

Literature:

Contacts: 


 

Image Generation for SVG

Description: 

In recent years, generative models (e.g. GAN, diffusion models, …) have become very popular and achieved impressive results. However, those models operate on raster images and only a few works have tried to address image generation in vector graphics (Also often being accepted at top conferences like CVPR, and NeurIPS). SVG is the standard for graphic designers, enhancing image generation techniques can greatly benefit design workflows. We are exploring this field and have many directions for talented students interested in text-to-image generation, image blending, style transfer, sketch generation, and more.

Literature:

 

 

Contacts: 


 

Recognition and classification of dynamic communicative body movements (DYCLASSIFIED 1.0)

Description: 

Human communication is inherently multimodal, involving facial expressions, gestures, postures, speech, and more, all dynamically coordinated. While traditional cognitive science relies on expert annotators for labeling communicative units, machine recognition and classification of dynamic body movements remain relatively unexplored. This project, DYCLASSIFIED 1.0, will investigate the application of existing multimodality-oriented pretrained models (such as BERT, GPT-3, CLIP, and GATO) in recognizing and classifying structure and temporality of dynamic communicative body movements. Collaborating with cognitive scientists and computer scientists from the Max Planck Institute of Psycholinguistics, the Donders Institute, and the Hasso-Plattner Institute, the research will adopt a data-driven perspective to operationalize unsupervised body movement classification. By fine-tuning existing models with sample labels, the project aims to extract implicit patterns in specific gestures. The outcomes may revolutionize manual gesture classification, opening new avenues in areas like surveillance, autonomous vehicles, medical data, and commercial applications.

Literature:

Contacts: 


 

Visualizations augmenting automatic analyses of multimodal behavior (VIZBOXER 1.0)

Description: 

Human communication is inherently multimodal, involving facial expressions, gestures, postures, speech, and more, all dynamically coordinated. While traditional cognitive science relies on expert annotators for labeling communicative units, machine recognition and classification of dynamic body movements remain relatively unexplored. This project, DYCLASSIFIED 1.0, will investigate the application of existing multimodality-oriented pretrained models (such as BERT, GPT-3, CLIP, and GATO) in recognizing and classifying structure and temporality of dynamic communicative body movements. Collaborating with cognitive scientists and computer scientists from the Max Planck Institute of Psycholinguistics, the Donders Institute, and the Hasso-Plattner Institute, the research will adopt a data-driven perspective to operationalize unsupervised body movement classification. By fine-tuning existing models with sample labels, the project aims to extract implicit patterns in specific gestures. The outcomes may revolutionize manual gesture classification, opening new avenues in areas like surveillance, autonomous vehicles, medical data, and commercial applications.

Literature

Contacts: 


 

Improving/studying VL benchmarks

Description: 

Given the impressive growth of Vision-and-Language models, several works have defined benchmarks with new visual and textual challenges to investigate and understand  the limits of these models. We are interested in developing strategies and approaches to overcome these shortcomings.

Literature:

Contacts: 


 

Visual Question Answering

Description: 

Visual question answering is the multi-modal task of answering natural questions about given visual data. Our team works on different aspects like improving question/image understanding, associating single modalities, improving answering of open-end questions by generative models in both general and medical domains.

Literature:

Contacts: 

 

Computer Vision


Artificial Intelligence for Gorilla Conservation

Description: 

Artificial intelligence can bring a positive impact to wildlife conservation. These can range from helping wildlife researchers by automatic detecting animal behaviors, predicting locations of animals, as well as detecting poachers

Contacts: 

 

Graph-Based Text Representation


Graph-based Document Representation for Downstream NLP tasks

Description: 

Most of the current deep models in NLP often focus on defining complex architectures in terms of the number of hyperparameters to improve performance on several tasks. However, they are not effective enough to ensure these are solved, suggesting that they are not learning the expected patterns and text features. Because of this, different graph-based text representation models were recently proposed, going beyond just processing text as token sequences. However, the range of tasks studied is usually limited to text classification. To validate and identify the limitations of these strategies for other machine reading comprehension tasks as summarization, question answering, or others, is an area to explore.

Literature:

Contacts: