Practical Applications of Multimedia Retrieval (Wintersemester 2015/2016)
Lecturer:
Prof. Dr. Christoph Meinel
(Internet-Technologien und -Systeme)
,
Dr. Haojin Yang
(Internet-Technologien und -Systeme)
Tutors:
Dr. Haojin Yang
General Information
- Weekly Hours: 4
- Credits: 6
- Graded:
yes
- Enrolment Deadline: 23.10.2015
- Teaching Form: SP
- Enrolment Type: Compulsory Elective Module
Programs, Module Groups & Modules
- Internet & Security Technology
- Operating Systems & Information Systems Technology
- IT-Systems Engineering A
- IT-Systems Engineering B
- IT-Systems Engineering C
- IT-Systems Engineering D
- IT-Systems Engineering Analyse
Description
In the last decade digital libraries and web video portals have become more and more popular. The amount of video data available on the World Wide Web (WWW) is growing rapidly. According to the official statistic-report of the popular video portal YouTube more than 300 hours of video are uploaded every minute. Therefore, how to efficiently retrieve video data on the web or within large video archives has become a very important and challenging task.
In our current research, we focus on video analysis and multimedia information retrieval (MIR) by using Deep-Learning techniques. Deep Learning (DL), as a new area of machine learning (since 2006), has already been impacting a wide range of multimedia information processing. Recently, the techniques developed based on DL achieved substantial progress in fields including Speech Recognition, Image Classification and Language Processing etc.
Topics in this seminar:
- Indoor human activities recognition
The number of surveillance cameras, importance of video analytics, storage time for surveillance data and strategic value of video surveillance are increasing significantly. Indoor human activities recognition is also one important part of event detection in surveillance videos. LIRIS provides a typical human activities recognition dataset which contains (gray/rgb/depth) videos showing people performing various activities taken from daily life (discussing, telephone calls, giving an item etc.).
- Video classification from audio clues
This project aims to explore and analyze the audio information within video for classification. It mainly focuses on the MFCC(Mel-Frequency Cepstral Coefficients ) feature extraction, and deep CNN (Convolutional Neural Networks) feature exaction. Then to train classifiers based on those extracted audio features.
- Sentence boundary detection from unpunctuated speech transcripts
In this task we want to segment the long unpunctuated transcripts into sentences. Audio files can also be included. The task should be accomplished within the framework of Deep Learning, both prosodic clues and lexical features can be used.
Requirements
Strong interests in video/image processing, machine learning and/or computer vision
Software development in C/C++
- Experience with OpenCV and machine learning applications as a plus
Literature
- Yoshua Bengio and Ian J. Goodfellow and Aaron Courville, "Deep Learning", online version: http://www.iro.umontreal.ca/~bengioy/dlbook/
- Simonyan, Karen and Zisserman, Andrew, "Two-Stream Convolutional Networks for Action Recognition in Videos", Advances in Neural Information Processing Systems (NIPS) 2014 (PDF)
- Caffe: Deep learning framework by the
Examination
The final evaluation will be based on:
Initial implementation / idea presentation, 10%
Final presentation, 20%
Report/Documentation, 12-18 pages, 30%
Implementation, 40%
- Participation in the seminar (bonus points)
Dates
Donnerstag, 13.30-15.00
Room H-E.52
15.10.2015 13:30-15:00 | Vorstellung der Themen |
bis 23.10.2015 23:59 | Wahl der Themen (Anmelden on Doodle) |
26.10.2015 | Bekanntgabe der Themen- und Gruppenzuordnung |
wöchentlich | Individuelle Meetings mit dem Betreuer |
Anfang Dezember | Technologievorträge und geführte Diskussion (je 15+5min) |
04.02.2016 | Präsentation der Endergebnisse (je 15+5min) |
bis 15.02.2016 | Abgabe von Implementierung und Dokumentation |
bis Ende Februar | Bewertung der Leistungen |
Zurück