Visual Media Analysis and Processing Techniques (Sommersemester 2023)

Dozent: Dr. Matthias Trapp (Computergrafische Systeme) , Max Reimann (Computergrafische Systeme) , Wattasseril Jobin Indiculla

Allgemeine Information

Semesterwochenstunden: 4
ECTS: 6
Benotet: Ja
Einschreibefrist: 01.04.2023 - 07.05.2023
Lehrform: Vorlesung / Übung
Belegungsart: Wahlpflichtmodul
Lehrsprache: Deutsch

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA

HCGT: Human Computer Interaction & Computer Graphics Technology
- HPI-HCGT-K Konzepte und Methoden
HCGT: Human Computer Interaction & Computer Graphics Technology
- HPI-HCGT-T Techniken und Werkzeuge
HCGT: Human Computer Interaction & Computer Graphics Technology
- HPI-HCGT-S Spezialisierung
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-K Konzepte und Methoden
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-T Techniken und Werkzeuge
OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-S Spezialisierung

Data Engineering MA

DANA: Data Analytics
- HPI-DANA-K Konzepte und Methoden
DANA: Data Analytics
- HPI-DANA-T Techniken und Werkzeuge
DANA: Data Analytics
- HPI-DANA-S Spezialisierung
DAPP: Data Applications
- HPI-DAPP-K Konzepte und Werkzeuge
DAPP: Data Applications
- HPI-DAPP-T Techniken und Werkzeuge
DAPP: Data Applications
- HPI-DAPP-S Spezialisierung

Digital Health MA

SCAD: Scalable Computing and Algorithms for Digital Health
- HPI-SCAD-C Concepts and Methods
SCAD: Scalable Computing and Algorithms for Digital Health
- HPI-SCAD-T Technologies and Tools
SCAD: Scalable Computing and Algorithms for Digital Health
- HPI-SCAD-S Specialization
APAD: Acquisition, Processing and Analysis of Health Data
- HPI-APAD-C Concepts and Methods
APAD: Acquisition, Processing and Analysis of Health Data
- HPI-APAD-T Technologies and Tools
APAD: Acquisition, Processing and Analysis of Health Data
- HPI-APAD-S Specialization

Beschreibung

This project seminar is intended for Master students who wish to acquire fundamental knowledge and skills in image/video processing, computer vision, and computer graphics to design, develop and implement GPU-accelerated image and video processing techniques, for use on mobile, desktop, and server systems. A short video showcasing results of recent courses can be found here: https://youtu.be/YNgGWarBFEY.

The course has mainly a project character and is subdivided into two parts:

The first part of the course is organized as a lecture series. The lecture topics are specified together with the seminar students and can include an introduction to the following basic concepts and foundations to:

A short introduction into the field of image and video analytics,
Techniques for image and video processing,
Application development for mobile and Desktop/Server systems

Using specific image and video processing operations, the course teaches how advanced image/video analysis techniques can be designed, developed, and tested.

In the second part of the course, participants will work individually, or as a team (max. 2 members), to implement assigned topics in the field of interactive image and video processing. For all target systems, we offer middleware for development, which can be used. For example, a C++ Framework for Desktop applications, an Android and iOS framework for mobile applications, and JS (Angular, Node framework) or Python (FastAPI framework) for service-based browser-applications will be provided. Topics for this project seminar cover the following domains (not limited to):

Convolutional Neural Networks for image analysis and transformation.
LSTM and Attention-based networks for sequence modeling of videos.
Image and video processing for VR (Virtual Reality) and AR (Augmented Reality) applications.
Generative models (GANs, diffusion models) for image/video generation.
Web-based image processing using WebGPU or WebGL.
Integration of interactive rendering techniques in 3rd party applications.
Implementation of interactive image stylization and editing tools for desktop systems.
Service-based image and video-processing.
Web-app development for service-based image- and video processing.
Integration of deep learning frameworks into visual computing pipelines for videos.
Implementing effects for visual media abstraction.
Automated video summarization approaches to efficiently and effectively shorten videos:
- Shot boundary detection using neural networks (Eg: TransNetV2)
- Scene boundary segmentation using neural networks (Eg: SceneSeg)
- Image/video captioning using deep learning (Eg: CLIP)
- Multimodal video analysis (Eg: movienet-tools)
- Query-based image and video retrieval approaches for video summarization (Eg: CLIP embedding based retrieval)
- Efficient deep learning based video classification models that can run on mobile devices (Eg: MoViNets)

Voraussetzungen

Basic knowledge of OpenGL (ES) Shading Languages or Metal Shading Language for image and video processing topics.
Basic knowledge/understanding of Neural Networks and/or Computer Vision algorithms for image and video analysis topics.
Basic working knowledge of OpenCV for computer vision topics.
Basic working knowledge of PyTorch/Tensorflow for deep learning topics.
For Service/WebApps development: basic knowledge/understanding of Angular, Node.js, JavaScript (or alternatively, Python, Django/FastAPI), and Docker.
For Android mobile development: basic knowledge of Kotlin/Java programming language.
For iOS development: basic knowledge of Swift development.
For Desktop development: basic knowledge of C++ development.

Literatur

C++11/C++14 reference: Stroustrup, Programming: Principles and Practice Using C++
JS reference: Haverbeke, Eloquent Javascript (3rd edition)
Deep learning references:
- General; Glassner, Deep Learning: A Visual Approach
- PyTorch; Stevens et al., Deep Learning with PyTorch
- Tensorflow 2.0; Géron, Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd edition)
Computer vision references:
- General; Klette; Concise Computer Vision: An Introduction into Theory and Algorithms
- 3D vision; Hartley and Zisserman; Multiple View Geometry in Computer Vision (2nd edition)
- OpenCV 4; Howse and Minichino; Learning OpenCV 4 Computer Vision with Python 3 (3rd edition)
Topic-specific material will be provided throughout the course

Lern- und Lehrformen

Project seminar (4 SWS/6 ECTS)

Leistungserfassung

The final grade will be determined as follows:

50% Documented source code & prototypical application
15% Concept presentation (approx. 10 minutes)
25% Final presentation (approx. 25 minutes)
10 % Projectmanagement

Termine

The seminar topics will be presented in the kick-off meeting. This meeting implemented on-site (A1.2) and via Zoom.us. The kick-off meeting will be on Monday, 24.04.2023, 11:00 - 12:30.

The rest of the seminar is organized as follows:

The individual topics are assigned not later then 30.04.2023. After topic assignment, the project phase will kick off.
The project part will start in a self-organized way. Appointments with the supervisor are coordinated with the individual supervisors.
The midterm presentation will take place in the week from 14.06.-25.06.2023.
Based on student’s voting, the final presentation will take place in September 2023.

Please note: In order to participate in the zoom meeting, please register in the respective moodle lecture: https://moodle.hpi.de/course/view.php?id=437

Zurück