Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI

Machine Intelligence with Deep Learning (Virtual Seminar) (Wintersemester 2020/2021)

Dozent: Dr. Haojin Yang (Internet-Technologien und -Systeme) , Christian Bartz (Internet-Technologien und -Systeme) , Joseph Bethge (Internet-Technologien und -Systeme) , Ting Hu (Internet-Technologien und -Systeme)

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 01.10.-20.11.2020
  • Lehrform: Seminar / Projekt
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Englisch
  • Maximale Teilnehmerzahl: 9

Studiengänge & Module

IT-Systems Engineering MA
  • ISAE-Konzepte und Methoden
  • ISAE-Techniken und Werkzeuge
  • ISAE-Spezialisierung
  • OSIS-Konzepte und Methoden
  • OSIS-Techniken und Werkzeuge
  • OSIS-Spezialisierung
  • ITSE-Entwurf
  • ITSE-Konstruktion
Data Engineering MA
Digital Health MA
Cybersecurity MA


(News: Time slot changed to Tuesday 13:30 - 15:00! See below for all times/dates.)

Artificial intelligence (AI) is the intelligence exhibited by computer. This term is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving". Currently researchers and developers in this field are making efforts to AI and machine learning algorithms which intend to train the computer to mimic some human skills such as "reading", "listening", "writing" and "making inference" etc. From the year 2006 "Deep Learning" (DL) has attracted more and more attentions in both academia and industry. Deep learning or deep neural networks is a branch of machine learning based on a set of algorithms that attempt to learn representations of data and model their high level abstractions. In a deep network, there are multiple so-called "neural layers" between the input and output. The algorithm is allowed to use those layers to learn higher abstraction, composed of multiple linear and non-linear transformations. Recently DL gives us break-record results in many novel areas as e.g., beating human in strategic game systems like Go (Google’s AlphaGo), self-driving cars, achieving dermatologist-level classification of skin cancer etc. In our current research we focus on video analysis and multimedia information retrieval (MIR) by using Deep-Learning techniques.

Course language: German and English

Topics in this seminar:

Similar Sentence Generator

Controlled sentence generation is essential for many NLP tasks. For example, writing assistants are widely used to modulate the grammar and the formality of our writing through rewriting or rewording. In this scenario, they are supposed to maintain sentences' semantics and bring in more diversity.  In this topic, we employ back-translation to generate sentences in similar semantics. Back-translation is the technique of translating the source language into another language and then back to the source language. During the process, the sentences' semantics are preserved, and different languages' syntactic structure and locution bring about variations in expressions. We will study different pre-trained multilingual machine translation models, experiment on different back-translation strategies, utilize NLP tools to filter generated sentences, and eventually use evaluation metrics to measure the generated results. You can find a video introducing the topic in more detail here. We also provide the slides for download here.

Creating 'Fake' Lecture Videos

Methods based on deep learning are very powerful in the area of computer vision. Since the introduction of Generative Adversarial Networks (GANs) by Goodfellow et al. the area of image generation has seen a large boost in popularity and today, very strong methods are available that can generate very realistic images with high resolution. The power of generative networks also comes with several downsides. One of the downsides is the possibilty to generate fake data that is (for a human) nearly indistinguishable from real data. The so called deep fakes raise ethical discussions and also provide motivation for more research into the detection whether image/video material is faked or not. However, being able to synthesize data does not necessarily have to be a bad thing. Think about a video platform like OpenHPI, where lecture videos have to be recorded. These videos have to be post-processed in order to correct mistakes made by the speakers and also to give the video a professional touch. This post-processing is highly time consuming and could benefit from automatic methods that help to seamlessy remove mistakes. In the last semester, we already tackled this problem and began to implement a paper that proposed a pipeline for automatic editing of videos based on changes made to a transcript of the video. In this semester, we will have a closer look at the pipeline and improve the results of the last seminar group. Specifically, we will have a look at image-to-image translation, 3D face model prediction, phoneme alignment, and further parts of the pipeline.
In the end, you will help us to build the prototype of a video processing pipeline that might be implemented in openHPI later! So, if you are interested in (automated) video editing, love to play with different libraries, are not afraid of Docker, and know your way around Python, this project is perfect for you. For further information, please have a look at the introduction video (slides can be found here).

"Light" Neural Architecture Search²

Convolutional neural networks have achieved astonishing results in different application areas. Various methods which allow us to use these models on mobile and embedded devices have been proposed. Especially Binary Neural Networks (BNNs) seem to be a promising approach for devices with low computational power or applications which have real-time requirements. Previous work in the area of neural architecture search has shown, that machine-optimized networks can perform better, than those designed by humans. In this topic you are going to implement neural architecture search specifically for BNNs, which focuses on keeping the number of operations low and the networks fast. Based on the two networks BinaryDenseNet and MeliusNet you are going to optimize these networks with neural architecture search. We suggest the implementation in BMXNet 2 with the autogluon package based on the work of a previous seminar team, but porting the code to Larq could be an alternative (Larq is based on Keras and Tensorflow). Please have a look at the introduction video for more details on Binary Neural Networks and the seminar topic (slides can be found here).


  • Strong interests in video/image processing, machine learning (Deep Learning) and/or computer vision

  • Software development in C/C++ or Python

  • Experience with OpenCV and machine learning applications as a plus



Online courses:

  • cs231n tutorials: Convolutional Neural Networks for Visual Recognition
  • Deep Learning courses at Coursera

Deep Learning frameworks:

Lern- und Lehrformen

We will have a virtual seminar in this semester. All meetings, presentations, etc. will take place on BigBlueButton.


The final evaluation will be based on:

  • Initial implementation / idea presentation, 10%

  • Final presentation, 20%

  • Report (research paper), 12-18 pages, 30%

  • Implementation , 40%

  • Participation in the seminar (bonus points)


Tuesday, 13:30 - 15:00

Room: Our Virtual Seminar Room (here)
Apart from the presentations, there will be no regular meetings in our seminar room!

from 26.10.2020

Presentation of Seminar Topics through Videos (see further up on this page)

03.11.2020 (13:30) Meet other interested students, find a team, and ask us questions about the topics. (join us in our Virtual Seminar Room here)

until 09.11.2020

Topic Choice (Inform us of your preferred and secondary topic by email: Christian.Bartz[ött]hpi.de)

until 10.11.2020

Topics and Teams finalized


Meetings with your tutor


Intermediate Presentations and Discussion (15+5min per Team)


Final Presentation (15+5min per Team)

until the end of february 2021

Hand in report and code (Latex template)

until the end of march 2021

Grading of projects