Hasso-Plattner-Institut25 Jahre HPI
Hasso-Plattner-Institut25 Jahre HPI

Visual Media Analysis and Processing Techniques (Sommersemester 2023)

Dozent: Dr. Matthias Trapp (Computergrafische Systeme) , Max Reimann (Computergrafische Systeme) , Wattasseril Jobin Indiculla

Allgemeine Information

  • Semesterwochenstunden: 4
  • ECTS: 6
  • Benotet: Ja
  • Einschreibefrist: 01.04.2023 - 07.05.2023
  • Lehrform: Vorlesung / Übung
  • Belegungsart: Wahlpflichtmodul
  • Lehrsprache: Deutsch

Studiengänge, Modulgruppen & Module

IT-Systems Engineering MA
  • HCGT: Human Computer Interaction & Computer Graphics Technology
    • HPI-HCGT-K Konzepte und Methoden
  • HCGT: Human Computer Interaction & Computer Graphics Technology
    • HPI-HCGT-T Techniken und Werkzeuge
  • HCGT: Human Computer Interaction & Computer Graphics Technology
    • HPI-HCGT-S Spezialisierung
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-K Konzepte und Methoden
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-T Techniken und Werkzeuge
  • OSIS: Operating Systems & Information Systems Technology
    • HPI-OSIS-S Spezialisierung
Data Engineering MA
Digital Health MA


This project seminar is intended for Master students who wish to acquire fundamental knowledge and skills in image/video processing, computer vision, and computer graphics to design, develop and implement GPU-accelerated image and video processing techniques, for use on mobile, desktop, and server systems. A short video showcasing results of recent courses can be found here: https://youtu.be/YNgGWarBFEY.



The course has mainly a project character and is subdivided into two parts:

The first part of the course is organized as a lecture series. The lecture topics are specified together with the seminar students and can include an introduction to the following basic concepts and foundations to:

  • A short introduction into the field of image and video analytics,
  • Techniques for image and video processing,
  • Application development for mobile and Desktop/Server systems

Using specific image and video processing operations, the course teaches how advanced image/video analysis techniques can be designed, developed, and tested.

In the second part of the course, participants will work individually, or as a team (max. 2 members), to implement assigned topics in the field of interactive image and video processing. For all target systems, we offer middleware for development, which can be used. For example, a C++ Framework for Desktop applications, an Android and iOS framework for mobile applications, and JS (Angular, Node framework) or Python (FastAPI framework) for service-based browser-applications will be provided. Topics for this project seminar cover the following domains (not limited to):


  • Convolutional Neural Networks for image analysis and transformation.
  • LSTM and Attention-based networks for sequence modeling of videos.
  • Image and video processing for VR (Virtual Reality) and AR (Augmented Reality) applications.
  • Generative models (GANs, diffusion models) for image/video generation.
  • Web-based image processing using WebGPU or WebGL.
  • Integration of interactive rendering techniques in 3rd party applications. 
  • Implementation of interactive image stylization and editing tools for desktop systems.
  • Service-based image and video-processing.
  • Web-app development for service-based image- and video processing.
  • Integration of deep learning frameworks into visual computing pipelines for videos.
  • Implementing effects for visual media abstraction.
  • Automated video summarization approaches to efficiently and effectively shorten videos:
    • Shot boundary detection using neural networks (Eg: TransNetV2)
    • Scene boundary segmentation using neural networks (Eg: SceneSeg)
    • Image/video captioning using deep learning (Eg: CLIP)
    • Multimodal video analysis (Eg: movienet-tools)
    • Query-based image and video retrieval approaches for video summarization (Eg: CLIP embedding based retrieval)
    • Efficient deep learning based video classification models that can run on mobile devices (Eg: MoViNets)


  • Basic knowledge of OpenGL (ES) Shading Languages or Metal Shading Language for image and video processing topics.
  • Basic knowledge/understanding of Neural Networks and/or Computer Vision algorithms for image and video analysis topics.
  • Basic working knowledge of OpenCV for computer vision topics.
  • Basic working knowledge of PyTorch/Tensorflow for deep learning topics.
  • For Service/WebApps development: basic knowledge/understanding of Angular, Node.js, JavaScript (or alternatively, Python, Django/FastAPI), and Docker.
  • For Android mobile development: basic knowledge of Kotlin/Java programming language.
  • For iOS development: basic knowledge of Swift development.
  • For Desktop development: basic knowledge of C++ development.


  • C++11/C++14 reference: Stroustrup, Programming: Principles and Practice Using C++
  • JS reference: Haverbeke, Eloquent Javascript (3rd edition)
  • Deep learning references:
    • General; Glassner, Deep Learning: A Visual Approach
    • PyTorch; Stevens et al., Deep Learning with PyTorch
    • Tensorflow 2.0; Géron, Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd edition)
  • Computer vision references:
    • General; Klette; Concise Computer Vision: An Introduction into Theory and Algorithms
    • 3D vision; Hartley and Zisserman; Multiple View Geometry in Computer Vision (2nd edition)
    • OpenCV 4; Howse and Minichino; Learning OpenCV 4 Computer Vision with Python 3 (3rd edition)
  • Topic-specific material will be provided throughout the course

Lern- und Lehrformen

Project seminar (4 SWS/6 ECTS)


The final grade will be determined as follows:

  • 50% Documented source code & prototypical application
  • 15% Concept presentation (approx. 10 minutes)
  • 25% Final presentation (approx. 25 minutes)
  • 10 % Projectmanagement 


The seminar topics will be presented in the kick-off meeting. This meeting implemented on-site (A1.2) and via Zoom.us. The kick-off meeting will be on Monday, 24.04.2023, 11:00 - 12:30.

The rest of the seminar is organized as follows: 

  • The individual topics are assigned not later then 30.04.2023. After topic assignment, the project phase will kick off. 
  • The project part will start in a self-organized way. Appointments with the supervisor are coordinated with the individual supervisors.
  • The midterm presentation will take place in the week from 14.06.-25.06.2023.
  • Based on student’s voting, the final presentation will take place in September 2023.

Please note: In order to participate in the zoom meeting, please register in the respective moodle lecture: https://moodle.hpi.de/course/view.php?id=437