Hasso-Plattner-Institut
 
    • de
Hasso-Plattner-Institut
Prof. Dr. Gerard de Melo
 

Open Research Topics

The following is a list of projects we currently work on. Feel free to reach us for possible collaborations or thesis proposals. Of course, we are always open to hear any additional idea or topic.

 

VQA

Description: 

Visual question answering is the multi-modal task of answering natural questions about given visual data. Our team works on different aspects like improving question/image understanding, associating single modalities, improving answering of open-end questions by generative models in both general and medical domains.

Literature:

Contacts: 


 

Multi-lingual VL models

Description: 

Vision-and-Language models have achieved impressive success in learning multimodal representations which noticeably improved the performances of those models in tasks like VQA, image captioning and retrieval. We aim to explore new strategies to extend this success to non-English low-resource languages.

Literature:

Contacts: 


 

Improving/studying VL benchmarks

Description: 

Given the impressive growth of Vision-and-Language models, several works have defined benchmarks with new visual and textual challenges to investigate and understand  the limits of these models. We are interested in developing strategies and approaches to overcome these shortcomings.

Literature:

Contacts: 


 

Analysis of Novel Social Media Datasets (Reddit)

Description: 

Recently, Reddit communities have come to prominence due to various features in mainstream media, including the hype around so-called “meme stocks” like GameStop caused by the community of r/WallStreetBets. There are thousands of communities on Reddit that can be explored from various perspectives. In contrast to other social networks like Twitter, there are still research gaps regarding Reddit on topics like knowledge extraction, language analysis and modeling, psychological profiles, etc. Other unexplored web datasets could be interesting as well.

Literature:

Contacts: 


 

Artificial Intelligence on Source Code

Description: 

Scientists have researched automatic understanding and generation of source code since the eighties. With the recent advances in deep learning, it finally became practical to also use it in a real-world setting like generating source code or source-code-related artifacts.

Literature:

Contacts: 


 

Artificial Intelligence for Wildlife Conservation

Description: 

Artificial intelligence can bring a positive impact to wildlife conservation. These can range from helping wildlife researchers by automatic detecting animal behaviors, predicting locations of animals, as well as detecting poachers

Contacts: 


 

Prompt Engineering for Diffusion Models

Description: 

Diffusion Models are the recent hot-topic in deep learning. For easily creating sustainable prompts we want to explore prompt engineering on diffusion models.

Literature:

Contacts: 


 

Natural Language Processing in German (or any other language)

Description: 

Thanks to HuggingFace, various different models for different tasks are quickly available for easy usage. Unfortunately, for many  interesting tasks in a professional environment there are no German models available. This often hinders the usage and adoption of AI for small-medium sized german companies.

Contacts: 


 

Multimodal Communication Projects

Description: 

There are different joint projects involving the Hasso Plattner Institute, the Max Plank Institute (Nijmegen) and the Radboud University. The list of available projects can be found here.

Contacts: 


 

Zero-shot/few-shot NLP

Description: 

Natural language processing (NLP) includes various tasks such as text classification, summarization, translation and so on. Most of the tasks have very little training data or no training data at all. And it is expensive to create training data manually. Therefore we study zero-shot/few-shot methods for NLP tasks with no/little training data.

Literature:

Contacts: 


 

Language Models for Special Languages

Description: 

We aim to answer to the following questions:

  • Can we transfer transformer-based language models to special languages (e.g. sysnthetic or historical languages)
  • Can we initialize them in a meaningful way to combat data shortage?

Contacts: 


 

Graph-based Text representation for MRC

Description: 

Most of the current deep models in NLP often focus on defining complex architectures in terms of the number of hyper-parameters to improve performance on several tasks. However, they are not effective enough to ensure these are solved, suggesting that they are not learning the expected patterns and text features. Because of this, different graph-based text representation models were recently proposed going beyond just processing text as token sequences. However, the range of tasks studied is usually limited to text classification. To validate and identify the limitations of these strategies for other machine reading comprehension tasks is an area to explore. 

Literature:

Contacts: 

  • Margarita Bugueño: contact​​​​​​​

 

Visually-Grounded Reasoning

Description: 

We aim to answer to the following question: can we use scene generation models (e.g. Dalle-2) to help language models solve reasoning tasks?

Contacts: