Practical Applications of Multimedia Retrieval (Wintersemester 2015/2016)

Dozent: Prof. Dr. Christoph Meinel (Internet-Technologien und -Systeme) , Dr. Haojin Yang (Internet-Technologien und -Systeme)
Tutoren: Dr. Haojin Yang

Allgemeine Information

Semesterwochenstunden: 4
ECTS: 6
Benotet: Ja
Einschreibefrist: 23.10.2015
Lehrform: SP
Belegungsart: Wahlpflichtmodul

Studiengänge, Modulgruppen & Module

IT-Systems Engineering BA

IT-Systems Engineering MA

IT-Systems Engineering A
IT-Systems Engineering B
IT-Systems Engineering C
IT-Systems Engineering D
IT-Systems Engineering Analyse

Beschreibung

In the last decade digital libraries and web video portals have become more and more popular. The amount of video data available on the World Wide Web (WWW) is growing rapidly. According to the official statistic-report of the popular video portal YouTube more than 300 hours of video are uploaded every minute. Therefore, how to efficiently retrieve video data on the web or within large video archives has become a very important and challenging task.

In our current research, we focus on video analysis and multimedia information retrieval (MIR) by using Deep-Learning techniques. Deep Learning (DL), as a new area of machine learning (since 2006), has already been impacting a wide range of multimedia information processing. Recently, the techniques developed based on DL achieved substantial progress in fields including Speech Recognition, Image Classification and Language Processing etc.

Topics in this seminar:

Indoor human activities recognition

The number of surveillance cameras, importance of video analytics, storage time for surveillance data and strategic value of video surveillance are increasing significantly. Indoor human activities recognition is also one important part of event detection in surveillance videos. LIRIS provides a typical human activities recognition dataset which contains (gray/rgb/depth) videos showing people performing various activities taken from daily life (discussing, telephone calls, giving an item etc.).

Video classification from audio clues

This project aims to explore and analyze the audio information within video for classification. It mainly focuses on the MFCC(Mel-Frequency Cepstral Coefficients ) feature extraction, and deep CNN (Convolutional Neural Networks) feature exaction. Then to train classifiers based on those extracted audio features.

Sentence boundary detection from unpunctuated speech transcripts

In this task we want to segment the long unpunctuated transcripts into sentences. Audio files can also be included. The task should be accomplished within the framework of Deep Learning, both prosodic clues and lexical features can be used.

Voraussetzungen

Strong interests in video/image processing, machine learning and/or computer vision
Software development in C/C++
Experience with OpenCV and machine learning applications as a plus

Literatur

Yoshua Bengio and Ian J. Goodfellow and Aaron Courville, "Deep Learning", online version: http://www.iro.umontreal.ca/~bengioy/dlbook/
Simonyan, Karen and Zisserman, Andrew, "Two-Stream Convolutional Networks for Action Recognition in Videos", Advances in Neural Information Processing Systems (NIPS) 2014 (PDF)
Caffe: Deep learning framework by the BVLC

Leistungserfassung

The final evaluation will be based on:

Initial implementation / idea presentation, 10%
Final presentation, 20%
Report/Documentation, 12-18 pages, 30%
Implementation, 40%
Participation in the seminar (bonus points)

Termine

Donnerstag, 13.30-15.00

Room H-E.52

15.10.2015 13:30-15:00	Vorstellung der Themen
bis 23.10.2015 23:59	Wahl der Themen (Anmelden on Doodle)
26.10.2015	Bekanntgabe der Themen- und Gruppenzuordnung
wöchentlich	Individuelle Meetings mit dem Betreuer
Anfang Dezember	Technologievorträge und geführte Diskussion (je 15+5min)
04.02.2016	Präsentation der Endergebnisse (je 15+5min)
bis 15.02.2016	Abgabe von Implementierung und Dokumentation
bis Ende Februar	Bewertung der Leistungen

Zurück