Advanced Image & Video Analytics Techniques (Wintersemester 2022/2023)

Lecturer: Dr. Matthias Trapp (Computergrafische Systeme) , Max Reimann (Computergrafische Systeme) , Wattasseril Jobin Indiculla

General Information

Weekly Hours: 4
Credits: 6
Graded: yes
Enrolment Deadline: 01.10.2022 - 31.10.2022
Examination time §9 (4) BAMA-O: 14.12.2022
Teaching Form: Seminar / Project
Enrolment Type: Compulsory Elective Module
Course Language: German

Programs, Module Groups & Modules

IT-Systems Engineering MA

HCGT: Human Computer Interaction & Computer Graphics Technology
- HPI-HCGT-K Konzepte und Methoden
HCGT: Human Computer Interaction & Computer Graphics Technology
- HPI-HCGT-S Spezialisierung
HCGT: Human Computer Interaction & Computer Graphics Technology
- HPI-HCGT-T Techniken und Werkzeuge
ISAE: Internet, Security & Algorithm Engineering
- HPI-ISAE-K Konzepte und Methoden
ISAE: Internet, Security & Algorithm Engineering
- HPI-ISAE-T Techniken und Werkzeuge

Data Engineering MA

DANA: Data Analytics
- HPI-DANA-K Konzepte und Methoden
DANA: Data Analytics
- HPI-DANA-T Techniken und Werkzeuge
DANA: Data Analytics
- HPI-DANA-S Spezialisierung
DAPP: Data Applications
- HPI-DAPP-K Konzepte und Werkzeuge
DAPP: Data Applications
- HPI-DAPP-T Techniken und Werkzeuge
DAPP: Data Applications
- HPI-DAPP-S Spezialisierung

Description

This project seminar aims at Masters students who wish to build upon fundamental image/video processing, computer vision, and computer graphics skills for the design, development and deployment of GPU-accelerated image and video processing techniques, for use on mobile, desktop, and server systems. A short video showcasing results of recent courses can be found here: https://youtu.be/YNgGWarBFEY.

The course has mainly a project character and is subdivided into two parts:

The first part of the course is organized as a lecture series. The lecture topics are specified together with the seminar students and can include an introduction to the following basic concepts and foundations to:

A short introduction into the field of image and video analytics,
Techniques for image and video processing,
Application development for mobile and Desktop/Server systems

Using specific image and video processing operations, the course teaches how advanced image/video analysis techniques can be designed, developed, and tested.

In the second part of the course, participants will work individually, or as a team (max. 2 members), to implement assigned topics in the field of interactive image and video processing. For all target systems, we offer middleware for development, which can be used. For example, a C++ Framework for Desktop applications, an Android and iOS framework for mobile applications, and JS (Angular, Node framework) or Python (FastAPI framework) for service-based browser-applications will be provided. Topics for this project seminar cover the following domains (not limited to):

Convolutional Neural Networks for image analysis and transformation*.
LSTM and Attention-based networks for sequence modeling of videos.
Image and video processing for VR (Virtual Reality) and AR (Augmented Reality) applications.
Generative models (GANs, diffusion models) for image/video generation.
Web-based image processing using WebGPU or WebGL.
Integration of interactive rendering techniques in 3rd party applications.
Implementation of interactive image stylization and editing tools for desktop systems.
Service-based image and video-processing*.
Web-app development for service-based image- and video processing.
Integration of deep learning frameworks into visual computing pipelines for videos.
Implementing effects for visual media abstraction*.
Automated video summarization approaches to efficiently and effectively shorten videos*:
- Shot boundary detection using neural networks (Eg: TransNetV2)
- Scene boundary segmentation using neural networks (Eg: SceneSeg)
- Image/video captioning using deep learning (Eg: CLIP)
- Multimodal video analysis (Eg: movienet-tools)
- Query-based image and video retrieval approaches for video summarization (Eg: CLIP embedding based retrieval)
- Efficient deep learning based video classification models that can run on mobile devices (Eg: MoViNets)

Topics marked by a * are related to a joint research project of the Hasso Plattner Institute with Digital Masterpieces and the German Federal Ministry of Education, which investigates new concepts and techniques for multidimensional video processing and automatic video abstraction.

Requirements

Basic knowledge of OpenGL (ES) Shading Languages or Metal Shading Language for image and video processing topics.
Basic knowledge/understanding of Neural Networks and/or Computer Vision algorithms for image and video analysis topics.
Basic working knowledge of OpenCV for computer vision topics.
Basic working knowledge of PyTorch/Tensorflow for deep learning topics.
For Service/WebApps development: basic knowledge/understanding of Angular, Node.js, JavaScript (or alternatively, Python, Django/FastAPI), and Docker.
For Android mobile development: basic knowledge of Kotlin/Java programming language.
For iOS development: basic knowledge of Swift development.
For Desktop development: basic knowledge of C++ development.

Literature

C++11/C++14 reference: Stroustrup, Programming: Principles and Practice Using C++
JS reference: Haverbeke, Eloquent Javascript (3rd edition)
Deep learning references:
- General; Glassner, Deep Learning: A Visual Approach
- PyTorch; Stevens et al., Deep Learning with PyTorch
- Tensorflow 2.0; Géron, Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd edition)
Computer vision references:
- General; Klette; Concise Computer Vision: An Introduction into Theory and Algorithms
- 3D vision; Hartley and Zisserman; Multiple View Geometry in Computer Vision (2nd edition)
- OpenCV 4; Howse and Minichino; Learning OpenCV 4 Computer Vision with Python 3 (3rd edition)
Topic-specific material will be provided throughout the course

Learning

Project seminar (4 SWS/6 ECTS)

Examination

The final grade will be determined as follows:

50% Documented source code & prototypical application
15% Concept presentation (approx. 10 minutes)
25% Final presentation (approx. 25 minutes)
10 % Projectmanagement

Dates

The seminar topics will be presented in the kick-off meeting. This meeting implemented on-site and via Zoom.us. The kick-off meeting will be on Wednesday, 19.10.2022, 13:30- 15:00.

The rest of the seminar is organized as follows:

The individual topics are assigned not later then 26.10.2022. After topic assignment, the project phase will kick off.
The project part will start in a self-organized way. Appointments with the supervisor are coordinated with the individual supervisors.
The midterm presentation will take place Wed., 14.12.2022, 13:30 in K-1.04.
Based on student’s voting, the final presentation will take place in April 2022.

Please note: In order to participate in the zoom meeting, please register in the respective moodle lecture: https://moodle.hpi.de/course/view.php?id=357

Zurück