Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI

Dark Web Monitoring and Analysis of Leak Data (Wintersemester 2014/2015)

Dozent: Prof. Dr. Christoph Meinel (Internet-Technologien und -Systeme) , Hendrik Graupner (Internet-Technologien und -Systeme)

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 24.10.2014
  • Lehrform: Seminar
  • Belegungsart: Wahlpflichtmodul
  • Maximale Teilnehmerzahl: 16

Studiengänge, Modulgruppen & Module

IT-Systems Engineering BA
IT-Systems Engineering MA
  • IT-Systems Engineering A
  • IT-Systems Engineering B
  • IT-Systems Engineering C
  • IT-Systems Engineering D


The dark web is the invisible part of the Internet, that is not accessible over search engines, such as Google or Yahoo. The size of this network is estimated to be 500 - 5,000 times the size of the visible Internet, the so called surface web. Because the content of this network is so hard to see and monitor, it has become a popular place for cyber crime. One manifestation of this crime is the stealing and publication of identity data.

In this seminar, the students will get a first insight into the dark web. In this context, we will focus on the gathering and analysis of publicly accessible identity data (e.g. passwords, credit card data, etc.). This topic involves following challenges:

  • Finding and Monitoring of numerous platforms that regularly publish stolen identity data
  • Automated normalization of identity data
  • Handling of huge amounts of identities extracted from leaked databases
  • Estimation of data quality, i.e. detection of fake identities and leaks
  • More complex analysis of identity data

The organization of the seminar is as follows. In the first two weeks the participants will explore the dark web and places where identity leaks are published. The results of this exploration are then presented in front of the seminar group and experiences are exchanged. As the next step, the participants utilize their gained experience to automate the workflow from data gathering through analysis. In the end, the results will be presented to the group and summarized in a final report.

The best implementation results will be integrated into the HPI Identity Leak Checker software.


Good programming skills are required. A prior participation in the course "Internet Security" would be helpful, but is not required.


  • Data Breach QuickView: An Executive's Guide to 2013 Data Breach Trends. Presentation. Risk Based Security, Feb. 2014.

Lern- und Lehrformen

The seminar is a practical hands-on to the underground of the Internet. Students are expected to implement methods of collection and analysis of leaked identity data. The seminar sessions will be used for presentation of results and introduction to the topics. Many seminar sessions are dedicated for the students to practically work on the solution for their tasks. However, the seminar will be accompanied by regular consultations.


The seminar is graded as follows:

  • First Experience with Manual Monitoring (20%)
    • Collection of Dark Web Data
    • Presentation of Results and Methods
  • Automated Monitoring (80%)
    • Idea and Concept (10%)
    • Implementation (50%)
      • Effectiveness
      • Engineering (Architecture, Style, Patterns, ...)
    • Final report (20%)


The first seminar session will be at October 16th. Seminars will take place at the following slots:

  • Thursdays, 1:30 PM in H-E.52