Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI
Login
 

07.08.2024

25 years of HPI: The Identity Leak Checker promotes cybersecurity awareness

One of the most popular HPI projects is the Identity Leak Checker (ILC), which can be used to check whether one's own identity data is circulating on the Internet. David Jaeger helped develop the ILC back in 2014. In this interview, he tells us about his experience working on the project.

Interview with HPI Alumnus David Jaeger

The Identity Leak Checker can be used to determine whether your own email address was part of a leak and your identity data is freely available online. It all began in 2013 following a major data leak that moved the public. Prof. Meinel and Dr. Feng Cheng's team initially began by extracting the affected HPI email addresses and checking them against other leaks. This eventually led to the idea of making this service accessible to external users. Since its launch in May 2014, more than 18.6 million users have made use of the service.

David Jaeger completed his bachelor's, master's and doctorate at HPI and now works as a cyber security architect at Airbus Protect. From the beginning, he was in charge of the ILC project, supported by Hendrik Graupner and Chris Pelchen, at the “Internet Technologies and Systems” department. Today, the project is part of Prof. Christian Dörr's “Cybersecurity - Enterprise Security” department.

On the occasion of our 25th anniversary, we spoke with HPI alumnus David Jaeger about the beginnings of the ILC and how his work on the project prepared him for his current job. 

Hasso Plattner Institute: What is the Identity Leak Checker?

David Jaeger: The Identity Leak Checker (ILC) is a free internet service provided by the Hasso Plattner Institute. With this service, users can check whether their personal information has been stolen and published in known data thefts. To do this, you enter your e-mail address on the ILC website (https://ilc.hpi.de/). You will then receive an e-mail with an overview of all data thefts in which the address you entered was found, including the suspected source and type of the stolen data. In the event of a match, users also receive recommendations on how they can protect themselves against further exploitation of their data.

HPI: What were the biggest challenges during development and support of the project?

Jaeger: The biggest challenge during the development of the ILC was the fast processing of requests, especially when many users accessed the service at the same time. Since data leaks often involve millions to billions of data records, the efficient storage and retrieval of this data was a major difficulty. The ILC now manages over 13 billion records, which was a huge challenge for current database systems at the time of its development in 2014. Thanks to SAP, however, we had the unique opportunity to use the SAP HANA in-memory database, which stores data in main memory and can respond to queries in milliseconds, even with billions of data records.

Another challenge was the high user load following reports in the mass media such as Stern TV, Spiegel, ARD Tagesschau and the international technology magazine Wired. As we did not want to outsource the sensitive data to a cloud for data protection reasons, we had to provide our own server capacity. Initially, we underestimated the response and therefore only planned for hardware with medium computing power. However, after a particularly large rush following the Spiegel report in 2017, we provided additional server capacity and optimized the query processing in the web server as well as the service itself in order to be able to better handle future requests.

The increased media coverage was usually accompanied by a flood of messages from users. In some cases, we received several hundred inquiries a day and the HPI reception phone lines were running hot, which pushed the capacity of our small team, which only looked after the ILC on the side, to the limit. We still wanted to be there for the users' questions and therefore tried to answer as many messages as possible, albeit with some delay. We were particularly pleased to receive so much positive feedback from users.

HPI: You now work as a Cyber Security Architect at Airbus Protect. How did your work on the ILC prepare you for your current role?

Jaeger: In my current role as Cyber Security Architect at Airbus, I am responsible for the design and implementation of attack detection systems in the networks of Airbus, its subsidiaries and external customers. A large part of the knowledge I use in my daily work came from my work and research at HPI during my doctorate. My PhD topic, the detection of complex multi-stage attacks in corporate networks, overlaps to a large extent with my current work.

However, my work on the ILC has taught me two important lessons in particular:

Firstly, dealing with huge data leaks gave me an understanding of the complex topic of Big Data. Big data and data science are a hot topic in the IT industry today. So I can say that the work has prepared me well for the future in the industry. For example, we often had to extract millions of data records from the leaks and convert them into a standardized format to make them available in the ILC. Similarly, when detecting attacks, billions of log entries from computers and network devices have to be imported and stored in a standardized format in a central system, the Security Information and Event Management (SIEM) system. The process of pre-processing, e.g. using so-called regular expressions, and the efficient feeding of data into a large data memory is very similar in both areas.

Secondly, by working with large amounts of login data, I have realized how often users use the same password for different services and how weak many of the chosen passwords are. Attackers use this for so-called credential stuffing and password spraying attacks. In credential stuffing attacks, attackers use username/password combinations found in a leak to log in to other services or networks. If the attacker is lucky, the same password as in the leak was used and the attacker gains access. Password spraying involves trying out frequently used passwords, such as password123. These methods are still one of the most common causes of successful attacks. Knowing how these attacks work enables me to detect them, for example by correlating information about leaked users with their login behavior.

HPI: Cybersecurity is an important topic, not only in teaching and research at HPI. What is your most striking memory on the subject?

Jaeger: The most striking thing for me is that many successful attacks today are not aimed at software or computer systems, but at people as the weakest link in IT security. This type of attack is known as social engineering, meaning the well-considered exploitation of human weaknesses, such as unchecked trust in authorities or loved ones. An impressive example of this is the hacker group Lapsus$, which was able to infiltrate even large companies such as Microsoft, Nvidia and Samsung in 2022 by manipulating individual company employees, despite their professional cyber defense teams.

We were also able to observe the effectiveness of this attack method in our “Capture-the-Flag” (CtF) seminar at HPI. This involves teams attempting to penetrate a network environment prepared by another team. In one case, students used spear phishing and typosquatting to obtain internal information on the next challenge from the opposing team and even from teaching staff. In this setting, social engineering was explicitly asked for by us seminar leaders. The damage was therefore limited. In the real world, however, this type of attack could have been fatal for a company and it shows how important it is to educate and sensitize users to the dangers of human manipulation by cyber criminals. During my time at HPI, we have tried to contribute to this education through our teaching, seminars and OpenHPI courses on digital identities and, of course, through the Identity Leak Checker.