What if AI admitted its weaknesses?

A photo of the two HPI doctoral students Roi Cohen and Konstantin Dobler. Both shrug their shoulders and raise their hands.

"I don't know." Four words that we humans actually (should) use often. Nothing is more unpleasant than a counterpart with exaggerated self-esteem and the inability to name their knowledge limits.

If the current major language models were human, they would be pretty unpleasant. The biggest weakness: LLMs always give an answer - even when they lack the necessary knowledge. But do Konstantin Dobler and Roi Cohen have a solution? Konstantin, a doctoral student in the field of AI and language models at HPI, says:

Blind faith in systems like ChatGPT is not a good idea. Unfortunately, these answers are often taken at face value. This makes it all the more important to recognize when requests exceed the capabilities or knowledge of the current system.

Together with Roi Cohen, Konstantin is working on responsible AI, how disinformation can be reduced and how so-called hallucinations - answers without a basis of knowledge - can be avoided. Both are PhD students at the chair of Professor Dr. Gerard de Melo.

A photo of the two HPI doctoral students Konstantin Dobler and Roi Cohen — The two HPI doctoral students Konstantin Dobler (left) and Roi Cohen

Hasso Plattner Institute (HPI): How does the model work?

Roi Cohen: Normal language models generate texts broken up into word parts - so-called "tokens". In addition to regular tokens, our IDK model also has a special I Don't Know Token (IDK token), which is generated instead of a normal prediction if the prediction is associated with great uncertainty. If such a model is to answer questions for which it has not learned the answer, it would typically still give an (uncertain) answer - so-called hallucinations. However, the IDK token allows this uncertainty to be expressed explicitly instead.

HPI: To what extent is what you are researching new?

Konstantin Dobler: On the one hand, our method is new because we don't need any special data to train our IDK model. We use any text that does not have to have a special format to learn a representation of uncertainty. In addition, the approach of representing uncertainty as a new token is new.

HPI: How does a model acquire the ability to guess it could be wrong?

Roi: We use a pre-trained model that has already learned knowledge and language comprehension. Now, we give this model text as input and look at which tokens the model provides an incorrect answer for. Our method trains the model to retain the correct answers but to select our IDK token instead of the incorrect ones. To solve this task well, the model must learn to represent the (in)certainty in its answers internally. This learned representation is then used to give the IDK token instead of incorrect answers when this uncertainty is too great.

It is essential that we do not just learn whether the underlying model can give the correct answer for specific question-answer pairs. The learned internal representation of uncertainty is more general and can also be used for content not seen in the training.

HPI: Why doesn't something like this already exist everywhere?

Konstantin: The problem of hallucinations is very present in the research community and is also being actively addressed. In fact, we are dealing with a complicated issue with many facets, and our method is not perfect, but only a step in the right direction. Calibration is particularly important: a model that always returns "I don't know" doesn't give me any wrong answers, but it's not particularly useful either. Sometimes, there are several correct answers; in other cases (e.g., writing stories or other creative tasks), there is no defined right or wrong.

So, to find use in commercial systems like ChatGPT, many application contexts must be covered.

HPI: Why is this “admitting mistakes” so important?

Roi: A major weakness of current language models is that they always answer, even if the underlying knowledge is missing. When I use ChatGPT, I always have to ask myself if the answer is hallucinated, especially in more complex topics and questions. Currently, blind faith in systems like ChatGPT is not a good idea. Unfortunately, these answers are often taken at face value. However, we no longer want to do without the many great features of language models. The more ChatGPT & Co. are used, the more critical it becomes to recognize when requests exceed the capabilities or knowledge of the current system.

Roi and Konstantin will continue their research to improve their method. In particular, they also want to discover how to help models give correct answers when uncertainty is detected. This can be done by using more computing time or relevant contexts from other sources, for example.

read Paper

Thanks to Konstantin Dobler and Roi Cohen for your time!

More news

Overview

Last change: 11/06/2026, Patrick Lenz