Visual Media Analysis and Processing Techniques (Sommersemester 2023)
Lecturer:
Dr. Matthias Trapp
(Computergrafische Systeme)
,
Max Reimann
(Computergrafische Systeme)
,
Wattasseril Jobin Indiculla
General Information
- Weekly Hours: 4
- Credits: 6
- Graded:
yes
- Enrolment Deadline: 01.04.2023 - 07.05.2023
- Teaching Form: Lecture / Exercise
- Enrolment Type: Compulsory Elective Module
- Course Language: German
Programs, Module Groups & Modules
- HCGT: Human Computer Interaction & Computer Graphics Technology
- HPI-HCGT-K Konzepte und Methoden
- HCGT: Human Computer Interaction & Computer Graphics Technology
- HPI-HCGT-T Techniken und Werkzeuge
- HCGT: Human Computer Interaction & Computer Graphics Technology
- HPI-HCGT-S Spezialisierung
- OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-K Konzepte und Methoden
- OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-T Techniken und Werkzeuge
- OSIS: Operating Systems & Information Systems Technology
- HPI-OSIS-S Spezialisierung
- DANA: Data Analytics
- HPI-DANA-K Konzepte und Methoden
- DANA: Data Analytics
- HPI-DANA-T Techniken und Werkzeuge
- DANA: Data Analytics
- HPI-DANA-S Spezialisierung
- DAPP: Data Applications
- HPI-DAPP-K Konzepte und Werkzeuge
- DAPP: Data Applications
- HPI-DAPP-T Techniken und Werkzeuge
- DAPP: Data Applications
- HPI-DAPP-S Spezialisierung
- SCAD: Scalable Computing and Algorithms for Digital Health
- HPI-SCAD-C Concepts and Methods
- SCAD: Scalable Computing and Algorithms for Digital Health
- HPI-SCAD-T Technologies and Tools
- SCAD: Scalable Computing and Algorithms for Digital Health
- HPI-SCAD-S Specialization
- APAD: Acquisition, Processing and Analysis of Health Data
- HPI-APAD-C Concepts and Methods
- APAD: Acquisition, Processing and Analysis of Health Data
- HPI-APAD-T Technologies and Tools
- APAD: Acquisition, Processing and Analysis of Health Data
- HPI-APAD-S Specialization
Description
This project seminar is intended for Master students who wish to acquire fundamental knowledge and skills in image/video processing, computer vision, and computer graphics to design, develop and implement GPU-accelerated image and video processing techniques, for use on mobile, desktop, and server systems. A short video showcasing results of recent courses can be found here: https://youtu.be/YNgGWarBFEY.
The course has mainly a project character and is subdivided into two parts:
The first part of the course is organized as a lecture series. The lecture topics are specified together with the seminar students and can include an introduction to the following basic concepts and foundations to:
- A short introduction into the field of image and video analytics,
- Techniques for image and video processing,
- Application development for mobile and Desktop/Server systems
Using specific image and video processing operations, the course teaches how advanced image/video analysis techniques can be designed, developed, and tested.
In the second part of the course, participants will work individually, or as a team (max. 2 members), to implement assigned topics in the field of interactive image and video processing. For all target systems, we offer middleware for development, which can be used. For example, a C++ Framework for Desktop applications, an Android and iOS framework for mobile applications, and JS (Angular, Node framework) or Python (FastAPI framework) for service-based browser-applications will be provided. Topics for this project seminar cover the following domains (not limited to):
- Convolutional Neural Networks for image analysis and transformation.
- LSTM and Attention-based networks for sequence modeling of videos.
- Image and video processing for VR (Virtual Reality) and AR (Augmented Reality) applications.
- Generative models (GANs, diffusion models) for image/video generation.
- Web-based image processing using WebGPU or WebGL.
- Integration of interactive rendering techniques in 3rd party applications.
- Implementation of interactive image stylization and editing tools for desktop systems.
- Service-based image and video-processing.
- Web-app development for service-based image- and video processing.
- Integration of deep learning frameworks into visual computing pipelines for videos.
- Implementing effects for visual media abstraction.
- Automated video summarization approaches to efficiently and effectively shorten videos:
- Shot boundary detection using neural networks (Eg: TransNetV2)
- Scene boundary segmentation using neural networks (Eg: SceneSeg)
- Image/video captioning using deep learning (Eg: CLIP)
- Multimodal video analysis (Eg: movienet-tools)
- Query-based image and video retrieval approaches for video summarization (Eg: CLIP embedding based retrieval)
- Efficient deep learning based video classification models that can run on mobile devices (Eg: MoViNets)
Requirements
- Basic knowledge of OpenGL (ES) Shading Languages or Metal Shading Language for image and video processing topics.
- Basic knowledge/understanding of Neural Networks and/or Computer Vision algorithms for image and video analysis topics.
- Basic working knowledge of OpenCV for computer vision topics.
- Basic working knowledge of PyTorch/Tensorflow for deep learning topics.
- For Service/WebApps development: basic knowledge/understanding of Angular, Node.js, JavaScript (or alternatively, Python, Django/FastAPI), and Docker.
- For Android mobile development: basic knowledge of Kotlin/Java programming language.
- For iOS development: basic knowledge of Swift development.
- For Desktop development: basic knowledge of C++ development.
Literature
- C++11/C++14 reference: Stroustrup, Programming: Principles and Practice Using C++
- JS reference: Haverbeke, Eloquent Javascript (3rd edition)
- Deep learning references:
- General; Glassner, Deep Learning: A Visual Approach
- PyTorch; Stevens et al., Deep Learning with PyTorch
- Tensorflow 2.0; Géron, Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd edition)
- Computer vision references:
- General; Klette; Concise Computer Vision: An Introduction into Theory and Algorithms
- 3D vision; Hartley and Zisserman; Multiple View Geometry in Computer Vision (2nd edition)
- OpenCV 4; Howse and Minichino; Learning OpenCV 4 Computer Vision with Python 3 (3rd edition)
- Topic-specific material will be provided throughout the course
Learning
Project seminar (4 SWS/6 ECTS)
Examination
The final grade will be determined as follows:
- 50% Documented source code & prototypical application
- 15% Concept presentation (approx. 10 minutes)
- 25% Final presentation (approx. 25 minutes)
- 10 % Projectmanagement
Dates
The seminar topics will be presented in the kick-off meeting. This meeting implemented on-site (A1.2) and via Zoom.us. The kick-off meeting will be on Monday, 24.04.2023, 11:00 - 12:30.
The rest of the seminar is organized as follows:
- The individual topics are assigned not later then 30.04.2023. After topic assignment, the project phase will kick off.
- The project part will start in a self-organized way. Appointments with the supervisor are coordinated with the individual supervisors.
- The midterm presentation will take place in the week from 14.06.-25.06.2023.
- Based on student’s voting, the final presentation will take place in September 2023.
Please note: In order to participate in the zoom meeting, please register in the respective moodle lecture: https://moodle.hpi.de/course/view.php?id=437
Zurück