Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI

Practical Applications of Deep Learning (Sommersemester 2021)

Dozent: Dr. Haojin Yang (Internet-Technologien und -Systeme) , Joseph Bethge (Internet-Technologien und -Systeme) , Hendrik Rätz (Data Analytics and Computational Statistics)

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 18.03.2021 - 09.04.2021
  • Lehrform: Seminar
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Englisch
  • Maximale Teilnehmerzahl: 9

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA
  • IT-Systems Engineering
    • HPI-ITSE-E Entwurf
  • IT-Systems Engineering
    • HPI-ITSE-K Konstruktion
  • ISAE: Internet, Security & Algorithm Engineering
    • HPI-ISAE-K Konzepte und Methoden
  • ISAE: Internet, Security & Algorithm Engineering
    • HPI-ISAE-T Techniken und Werkzeuge
  • ISAE: Internet, Security & Algorithm Engineering
    • HPI-ISAE-S Spezialisierung
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-K Konzepte und Methoden
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-T Techniken und Werkzeuge
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-S Spezialisierung
Data Engineering MA
Digital Health MA
Cybersecurity MA


Artificial intelligence (AI) is the intelligence exhibited by computer. This term is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving". Currently researchers and developers in this field are making efforts to AI and machine learning algorithms which intend to train the computer to mimic some human skills such as "reading", "listening", "writing" and "making inference" etc. From the year 2006 "Deep Learning" (DL) has attracted more and more attentions in both academia and industry. Deep learning is a branch of machine learning, based on a set of algorithms that attempt to learn representations of data and model their high level abstractions. In a deep neural network, there are multiple so-called "neural layers" between the input and output. The algorithm is allowed to use those layers to learn higher abstraction, composed of multiple linear and non-linear transformations. Recently DL achieved record breaking results in many novel areas as e.g., beating humans in real-time strategy games (Starcraft), powering self-driving cars, achieving dermatologist-level classification of skin cancer etc. In our current research we focus on video analysis and multimedia information retrieval (MIR) by using Deep-Learning techniques.

Course language: German and English

Topics in this seminar:

  • Creating 'Fake' Lecture Videos Methods based on deep learning are very powerful in the area of computer vision. Since the introduction of Generative Adversarial Networks (GANs) by Goodfellow et al. the area of image generation has seen a large boost in popularity and today, very strong methods are available that can generate very realistic images with high resolution. The power of generative networks also comes with several downsides. One of the downsides is the possibilty to generate fake data that is (for a human) nearly indistinguishable from real data. The so called deep fakes raise ethical discussions and also provide motivation for more research into the detection whether image/video material is faked or not. But, being able to fake data does not necessarily have to be a bad thing. Think about a video platform like OpenHPI, where lecture videos have to be recorded. These videos have to be post-processed in order to get rid of mistakes made by the speakers and give the video a professional touch. This post-processing is highly time consuming and could benefit from automatic methods that help to seamlessy remove mistakes. Another possibility could be to generate a realistic looking video by only supplying a written transcript and letting the computer do the rest. In this seminar topic, we want to have a look at automatic methods for editing videos with only one speaker (which is the typical setting of lecture videos). If everything works as planned, we will implement a method, where the user is able to modify the recorded video sequence, by modifying an automatically extracted transcript of the words spoken by the lecturer. (Introduction Video can be found here.)
  • Optimizing Inference of Binary Neural Networks: Convolutional neural networks have achieved astonishing results in different application areas. Various methods which allow us to use these models on mobile and embedded devices have been proposed. Especially Binary Neural Networks (BNNs) seem to be a promising approach for devices with low computational power or applications which have real-time requirements. In this topic you are going to optimize the inference of BNNs with BMXNet 2, based on advances in other frameworks. The goal at the end is to run a real-time machine learning demo application on a RaspberryPi (provided by us), without relying on a network connection. (Introduction video can be found here.)
  • Handwriting Synthesis for improved Optical Character Recognition (OCR): In recent years, the field of handwriting OCR remained an interesting field of study because it is still challenging to correctly handle the variation that is naturally contained in handwriting since it may differ from person to person. That can be problematic because a trained OCR model might not be able to recognize the handwriting of authors whose writing is too different from the training samples. A possible solution for this problem is the usage of Generative Adversarial Networks (GANs) to synthesize handwritten samples, which have the same style as the author's writing. These generated samples can then be used to finetune the original OCR model and hopefully increase the accuracy on the samples of the newly introduced authors. (Intro video can be found here)

We are currently preparing more detailed video introductions to these topics:

  • This Video (10 min) contains the introduction to the topic Creating 'Fake' Lecture Videos.
  • This Video (12 min) contains a brief introduction to the topic Optimizing Inference of Binary Neural Networks.
  • This Video (4 min) contains a brief introduction to the topic Handwriting Synthesis for OCR.


  • Strong interests in video/image processing, machine learning (Deep Learning) and/or computer vision
  • Software development in C/C++ or Python
  • Experience with OpenCV and machine learning applications as a plus



Online courses:

  • cs231n tutorials: Convolutional Neural Networks for Visual Recognition
  • Deep Learning courses at Coursera

Deep Learning frameworks:


The final evaluation will be based on:

  • Initial implementation / idea presentation, 10%
  • Final presentation, 20%
  • Report/Documentation, 12-18 pages, 30%
  • Implementation, 40%
  • Participation in the seminar (bonus points)


Monday, 15:15 - 16:45

(apart from the presentations, there will be no regular meetings in our seminar room!)

Virtual room

09.04.2021 14:00-15:00 QA session for seminar topics. (join us here)

until 09.04.2021

Belegung des Seminars beim Studienreferat (Studienreferat(at)hpi.uni-potsdam.de)

(Send your preferred and secondary topic to: haojin.yang@hpi.de)

13.04.2021 (TUESDAY) 15:15 - 16:45

Meet other interested students, find a team, and ask questions. (join us here)

(Send your preferred and secondary topic to: haojin.yang@hpi.de)

until 20.04.2021

Topics and Teams finalized


Meetings with your tutor

08.06.2021 (13:30)

Mid-Term presentation (15+5min)

20.07.2021 (13:30)

Final presentation (15+5min)

until end of August 2021

Hand in code and paper (Latex template)

until end of September 2021