Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI

Practical Video Analyses (Sommersemester 2017)

Dozent: Dr. Haojin Yang (Internet-Technologien und -Systeme)

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 28.04.2017
  • Lehrform: Seminar
  • Belegungsart: Wahlpflichtmodul
  • Maximale Teilnehmerzahl: 12

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA
  • ISAE: Internet, Security & Algorithm Engineering
    • HPI-ISAE-S Spezialisierung
  • ISAE: Internet, Security & Algorithm Engineering
    • HPI-ISAE-T Techniken und Werkzeuge
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-K Konzepte und Methoden
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-S Spezialisierung
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-T Techniken und Werkzeuge
  • IT-Systems Engineering
    • HPI-ITSE-A Analyse
  • IT-Systems Engineering
    • HPI-ITSE-E Entwurf
  • IT-Systems Engineering
    • HPI-ITSE-K Konstruktion
  • IT-Systems Engineering
    • HPI-ITSE-M Maintenance


 In the last decade digital libraries and web video portals have become more and more popular. The amount of video data available on the World Wide Web (WWW) is growing rapidly. According to the official statistic-report of the popular video portal YouTube more than 400 hours of video are uploaded every minute. Therefore, how to efficiently retrieve video data on the web or within large video archives has become a very important and challenging task.

In our current research we focus on video analysis and multimedia information retrieval (MIR) by using Deep-Learning techniques. Deep Learning (DL), as a new area of machine learning (since 2006), has already been impacting a wide range of multimedia information processing. Recently, the techniques developed based on DL achieved substantial progress in fields including Computer Vision, Speech Recognition, Image Classification and NLP etc.

Topics in this seminar:

  • A general LSTM (Long Short Term Memory) framework for NLP applications With this topic we plan to develop an general framework which can work with various NLP (Natural Language Processing) applications, such as machine translation, sentiment anlaysis, word disambigution, etc. The technical core is the "LSTM network + Word Vectors", which has been proven to be highly effictive in many cases, including Google Translate. In this topic, you are supposed to create a sequence-to-sequence LSTM model and adjust it into sequence-to-point structure, in order to handle different NLP tasks. We believe by investing time and efforts in this topic, you would not only learn the theories or/and improve your programming skill in specific tasks, but also catch and understand the new trend of the developement in NLP and DL.
  • Adversarial training for medical image segmentation applications Medical imaging is an important step on diagnosis for surgical or chemical planning. Magnetic resonance imaging (MRI) provides rich information for before and during treatment to evaluate the treatment and lesion progress. In medical image analysis domain, automated lesions segmentation is an important clinical diagnostic task and very challenging.  Inspired by the promising results achieved by deep learning in many application fields, an automated application based on adversarial training is a very practical and interesting topic. Currently we have two datasets for brain tumor segmentation and Liver tumor segmentation which will be selected and applied in this topic. [6,7]
  • PlaceRecognizer If you like to travel, you most certainly have been at the point where you stood somewhere in an unknown city and asked yourself: What kind of building is this? What is it for? Who was the architect of this building? Well, fear no more! Due to modern computer vision technology we might be able to answer these questions for you right on your smartphone!

    In this seminar topic we want to have a look at how to create a robust deep learning model that is able to recognize buildings from a given image. In order to do this we will need to gather training data (e.g. from street view images) and think of a good network architecture and method for training such a model. So if you are interested in data gathering, training of deep neural networks and maybe also Android Development, this topic is perfectly suited for you!


  • Strong interests in video/image processing, machine learning (Deep Learning) and/or computer vision

  • Software development in C/C++ or Python

  • Experience with OpenCV and machine learning applications as a plus



The final evaluation will be based on:

  • Initial implementation / idea presentation, 10%

  • Final presentation, 20%

  • Report/Documentation, 12-18 pages, 30%

  • Implementation, 40%

  • Participation in the seminar (bonus points)


Montag, 13.30-15.00

Room H-2.58

24.04.2017 13:30-15:00

Vorstellung der Themen (PDF)

bis 27.04.2017 

Wahl der Themen  (Anmelden on Doodle)


Bekanntgabe der Themen- und Gruppenzuordnung


Individuelle Meetings mit dem Betreuer


Technologievorträge und geführte Diskussion (je 15+5min)


Präsentation der Endergebnisse (je 15+5min)

bis Mitte August

Abgabe von Implementierung und Dokumentation

bis Ende September

Bewertung der Leistungen