Practical Video Analyses (Sommersemester 2017)

Lecturer: Dr. Haojin Yang (Internet-Technologien und -Systeme)

General Information

Weekly Hours: 4
Credits: 6
Graded: yes
Enrolment Deadline: 28.04.2017
Teaching Form: Seminar
Enrolment Type: Compulsory Elective Module
Maximum number of participants: 12

Programs, Module Groups & Modules

IT-Systems Engineering MA

ISAE: Internet, Security & Algorithm Engineering
- HPI-ISAE-S Spezialisierung
ISAE: Internet, Security & Algorithm Engineering
- HPI-ISAE-T Techniken und Werkzeuge
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-K Konzepte und Methoden
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-S Spezialisierung
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-T Techniken und Werkzeuge
IT-Systems Engineering
- HPI-ITSE-A Analyse
IT-Systems Engineering
- HPI-ITSE-E Entwurf
IT-Systems Engineering
- HPI-ITSE-K Konstruktion
IT-Systems Engineering
- HPI-ITSE-M Maintenance

Description

In the last decade digital libraries and web video portals have become more and more popular. The amount of video data available on the World Wide Web (WWW) is growing rapidly. According to the official statistic-report of the popular video portal YouTube more than 400 hours of video are uploaded every minute. Therefore, how to efficiently retrieve video data on the web or within large video archives has become a very important and challenging task.

In our current research we focus on video analysis and multimedia information retrieval (MIR) by using Deep-Learning techniques. Deep Learning (DL), as a new area of machine learning (since 2006), has already been impacting a wide range of multimedia information processing. Recently, the techniques developed based on DL achieved substantial progress in fields including Computer Vision, Speech Recognition, Image Classification and NLP etc.

Topics in this seminar:

A general LSTM (Long Short Term Memory) framework for NLP applications With this topic we plan to develop an general framework which can work with various NLP (Natural Language Processing) applications, such as machine translation, sentiment anlaysis, word disambigution, etc. The technical core is the "LSTM network + Word Vectors", which has been proven to be highly effictive in many cases, including Google Translate. In this topic, you are supposed to create a sequence-to-sequence LSTM model and adjust it into sequence-to-point structure, in order to handle different NLP tasks. We believe by investing time and efforts in this topic, you would not only learn the theories or/and improve your programming skill in specific tasks, but also catch and understand the new trend of the developement in NLP and DL.
Adversarial training for medical image segmentation applications Medical imaging is an important step on diagnosis for surgical or chemical planning. Magnetic resonance imaging (MRI) provides rich information for before and during treatment to evaluate the treatment and lesion progress. In medical image analysis domain, automated lesions segmentation is an important clinical diagnostic task and very challenging. Inspired by the promising results achieved by deep learning in many application fields, an automated application based on adversarial training is a very practical and interesting topic. Currently we have two datasets for brain tumor segmentation and Liver tumor segmentation which will be selected and applied in this topic. [6,7]
PlaceRecognizer If you like to travel, you most certainly have been at the point where you stood somewhere in an unknown city and asked yourself: What kind of building is this? What is it for? Who was the architect of this building? Well, fear no more! Due to modern computer vision technology we might be able to answer these questions for you right on your smartphone!
In this seminar topic we want to have a look at how to create a robust deep learning model that is able to recognize buildings from a given image. In order to do this we will need to gather training data (e.g. from street view images) and think of a good network architecture and method for training such a model. So if you are interested in data gathering, training of deep neural networks and maybe also Android Development, this topic is perfectly suited for you!

Requirements

Strong interests in video/image processing, machine learning (Deep Learning) and/or computer vision
Software development in C/C++ or Python
Experience with OpenCV and machine learning applications as a plus

Literature

[1] Yoshua Bengio and Ian J. Goodfellow and Aaron Courville, "Deep Learning", online version: http://www.deeplearningbook.org/
[2] cs231n tutorials: Convolutional Neural Networks for Visual Recognition
[3] Caffe: Deep learning framework by the BVLC
[4] Chainer: A flexible framework of neural networks
[5] ENCP/CNNdroid: Open Source Library for GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android
[6] Semantic Segmentation using Adversarial Networks
[7] Pixel level domain transfer

Examination

The final evaluation will be based on:

Initial implementation / idea presentation, 10%
Final presentation, 20%
Report/Documentation, 12-18 pages, 30%
Implementation, 40%
Participation in the seminar (bonus points)

Dates

Montag, 13.30-15.00

Room H-2.58

24.04.2017 13:30-15:00	Vorstellung der Themen (PDF)
bis 27.04.2017	Wahl der Themen (Anmelden on Doodle)
28.04.2016	Bekanntgabe der Themen- und Gruppenzuordnung
wöchentlich	Individuelle Meetings mit dem Betreuer
29.05.2017	Technologievorträge und geführte Diskussion (je 15+5min)
24.07.2017	Präsentation der Endergebnisse (je 15+5min)
bis Mitte August	Abgabe von Implementierung und Dokumentation
bis Ende September	Bewertung der Leistungen

Zurück