Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI

Practical Applications of Multimedia Retrieval (Wintersemester 2015/2016)

Dozent: Prof. Dr. Christoph Meinel (Internet-Technologien und -Systeme) , Dr. Haojin Yang (Internet-Technologien und -Systeme)
Tutoren: Dr. Haojin Yang

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 23.10.2015
  • Lehrform: SP
  • Belegungsart: Wahlpflichtmodul

Studiengänge, Modulgruppen & Module

IT-Systems Engineering BA
IT-Systems Engineering MA
  • IT-Systems Engineering A
  • IT-Systems Engineering B
  • IT-Systems Engineering C
  • IT-Systems Engineering D
  • IT-Systems Engineering Analyse


In the last decade digital libraries and web video portals have become more and more popular. The amount of video data available on the World Wide Web (WWW) is growing rapidly. According to the official statistic-report of the popular video portal YouTube  more than 300 hours of video are uploaded every minute. Therefore, how to efficiently retrieve video data on the web or within large video archives has become a very important and challenging task.

In our current research, we focus on video analysis and multimedia information retrieval (MIR) by using Deep-Learning techniques. Deep Learning (DL), as a new area of machine learning (since 2006), has already been impacting a wide range of multimedia information processing. Recently, the techniques developed based on DL achieved substantial progress in fields including Speech Recognition, Image Classification and Language Processing etc.

Topics in this seminar:

  • Indoor human activities recognition

The number of surveillance cameras, importance of video analytics, storage time for surveillance data and strategic value of video surveillance are increasing significantly. Indoor human activities recognition is also one important part of event detection in surveillance videos. LIRIS provides a typical human activities recognition dataset which contains (gray/rgb/depth) videos showing people performing various activities taken from daily life (discussing, telephone calls, giving an item etc.). ​

  • Video classification from audio clues

 This project aims to explore and analyze the audio information within video for classification.  It mainly focuses on the MFCC(Mel-Frequency Cepstral Coefficients ) feature extraction, and deep CNN (Convolutional Neural Networks) feature exaction.  Then to train classifiers based on those extracted audio features.

  • Sentence boundary detection from unpunctuated speech transcripts

In this task we want to segment the long unpunctuated transcripts into sentences. Audio files can also be included. The task should be accomplished within the framework of Deep Learning, both prosodic clues and lexical features can be used.


  • Strong interests in video/image processing, machine learning and/or computer vision

  • Software development in C/C++

  • Experience with OpenCV and machine learning applications as a plus


  • Yoshua Bengio and Ian J. Goodfellow and Aaron Courville, "Deep Learning", online version: http://www.iro.umontreal.ca/~bengioy/dlbook/
  • Simonyan, Karen and Zisserman, Andrew, "Two-Stream Convolutional Networks for Action Recognition in Videos", Advances in Neural Information Processing Systems (NIPS) 2014 (PDF)
  • Caffe: Deep learning framework by the BVLC


The final evaluation will be based on:

  • Initial implementation / idea presentation, 10%

  • Final presentation, 20%

  • Report/Documentation, 12-18 pages, 30%

  • Implementation, 40%

  • Participation in the seminar (bonus points)


Donnerstag, 13.30-15.00

Room H-E.52

15.10.2015 13:30-15:00

Vorstellung der Themen

bis 23.10.2015 23:59 

Wahl der Themen  (Anmelden on Doodle)


Bekanntgabe der Themen- und Gruppenzuordnung


Individuelle Meetings mit dem Betreuer

Anfang Dezember

Technologievorträge und geführte Diskussion (je 15+5min)


Präsentation der Endergebnisse (je 15+5min)

bis 15.02.2016

Abgabe von Implementierung und Dokumentation

bis Ende Februar

Bewertung der Leistungen