PD Dr. Haojin Yang

Head of Multimedia and Machine Learning Research Group

Hasso Plattner Institute for Digital Engineering gGmbH
Prof.-Dr.-Helmert-Str. 2-3
14482 Potsdam
Germany

email: haojin.yang(at)hpi.de

Google Scholar DBLP

Website of Multimedia and Machine Learning Research Group

Research Interests

Efficient AI, Edge AI, deep model acceleration and compression

Multimedia Analysis with Deep Learning

Learning and understanding multimedia content is a challenging task in the research field of information retrieval and multimedia analysis. Deep Learning (DL), as a new area of machine learning (since 2006), has already been impacting a wide range of multimedia information processing. Recently, the techniques developed based on DL achieved substantial progress in fields including Speech Recognition, Image Classification and Language Processing. It has been proved that through simulating human neural network and hierarchically (layer-by-layer) learning features from large scale data can significantly improve analytic results. In this project, we focus on developing multimedia retrieval approaches based on DL technologies.

Current research topics:

Multimedia analysis and computer vision

End-to-end scene text detection and recognition in real-time using deep neural networks
Image/video captioning
Multimodal data retrieval
Deep Learning in medical image processing e.g. brain abnormality detection
Handwriting analysis in art historical data
Other computer vision applications

Research in deep learning algorithm

Efficient AI system with the focus on edge computing
Binary neural networks
Label noise, weakly supervised representation learning, label synthesis
Efficient deep models for NLP applications

Current Projects

KI-Leuchttürme project for environment, climate, nature and resources

EKAPEx: New efficient AI algorithms for innovative forecasting methods for extreme weather events (2023-2025)

HPI will act as the coordinator for the project and collaborate with machine learning experts from the Technical University of Munich (TUM) and atmospheric and meteorological experts from the GeoForschungsZentrum Potsdam (GFZ). The aim of the project is to develop AI-based precipitation forecasting for Germany, with a special emphasis on extreme weather events. To accomplish this, the team will develop the most efficient and powerful AI algorithms possible, while also significantly reducing resource consumption. Unique datasets, such as Integrated Water Vapor and Slant Integrated Water Vapor obtained from GNSS observations, will also be utilized to enhance forecasting capabilities. The project aims to develop an accessible platform that will contribute significantly to the improvement of climate adaptation measures and the sustainable use of AI.

The team will employ efficient designs of AI algorithms, such as few-shot learning, zero-shot learning, and open-set recognition methods, in order to decrease dependence on large amounts of data and manual annotation for weather forecasting. Additionally, neural networks will be applied that can operate with a lower bitrate by converting the parameters and intermediate results of the network from previous 32-bit models to a binary value with only one bit, while minimizing accuracy loss. The project will also specifically address power consumption of AI methods as a source of greenhouse gases.

Deep Learning for Enterprise NLP Applications

Project partner: SAP Conversational AI team (2017-2020)

In this project we will develop a framework for building general-applicable as well as domain-specific NLP models by using state-of-the-art deep learning technology. The research problem on textual representation learning will be studied intended to find the most efficient solution for deep neural network design, and system implementation. The evaluation protocol will be defined and developed for the qualitative and quantitative evaluation.

Project partner: SAP ICN Machine Learning team (2020-2022)

The recently emerged large-scale pre-trained language models based on the Transformer model, such as GPT-3 (175 billion parameters) and Switch Transformer (1600 billion parameters), have brought about a series of breakthroughs in many Natural Language Processing (NLP) tasks. However, the training of these large-scale models is computationally expensive. Moreover, these models generally have billions of parameters, making it challenging to conduct inference on resource-limited devices. In this project, we will dive into how such large scale models work, study different approaches to decrease their space and time complexity during training and inference, and evaluate them on different Natural Language Understanding (NLU) and Natural Language Generation (NLG) benchmarks.

Binary Neural Networks, Deep Model Compression and Acceleration

Project partner: PyTorch, NICSEFC, MXNet

In recent years, deep learning technologies achieved excellent performance and many breakthroughs in both academia and industry. However the state-of-the-art deep models are computational expensive
and consume large storage space. Deep learning is also strongly demanded by numerous applications from areas such as mobile platforms, wearable devices, autonomous robots and IoT devices. How to efficiently apply deep models on such low power devices becomes a challenging research problem. In this project we will explore several different approaches such as binarized, quantized as well as lightweight deep neural networks for this problem. The development is based on well known open source deep learning library PyTorch and Apache MXNet. As a in progress research result we have developed two open source frameworks:

BMXNet: [codes] [examples] [demos]
BITorch: [codes] [examples] [demos]

Image Analysis in a Large Scale Art Historical Database (2019-2023)

Project partner: Wildenstein Plattner Institute

With increasing digitization and storage capacities, it becomes more and more viable to undergo massive digitization projects for analogue archives. Digitization allows easy access and long term preservation of old and sensitive physical material, where access is typically denied. Furthermore, digitization allows the material to be processed more efficiently. In this project, we aim to develop and apply novel automatic processing methods for the digitized archive of the WPI. Since Archival material, especially in the art history domain, contains many images and handwriting, we concentrated on analysing and extracting handwritten information. Challenges, which should be addressed in this project are scalability and quality of different approaches for handwriting recognition. The digitization project that the WPI is undergoing covers a document corpus of many million pages in different fonts, languages and physical condition.

Besides handwriting as one important type of semantic information in an archive, a digitized archive also contains many scans of documents that contain images. These images may be photographs, reproductions of works of art, or even sketches. A digitization pipeline would greatly benefit from additional analysis steps extracting metadata from such documents. In this line of work further analysis steps, such as classification of documentsby visual appearance, automatic creation of textual metadata (i.e. descriptions) of images, and recognition of depicted objects in images shall be added to the resulting digitization pipeline. All of the developed approaches shall be incorporated into a system usable by the researchers of the WPI by incorporation into their cataloguing software.

Intelligent Lecture Video Analysis and Retrieval

Project partner: tele-TASK and openHPI team

Video Lecture Browser: Lecture video content analysis, automatic video indexing, content-based video search, lecture speech recognition, lecture slides recognition.
Automatic E-Lecture Material Enhancement

Former Projects

tele-TASK

tele-TASK:(tele-Teaching Anywhere Solution Kit) is an advanced mobile system for the production of Internet streaming videos and podcasts featuring a new and drastically simplified technology.

Semantic Web Research

MEDIAGLOBE - the digital archive is part of the THESEUS research program initiated by the German Federal Ministry of Economy and Technology (BMWi). MEDIAGLOBE deals with digitization, analysis, and semantic retrieval of historical, documentary audiovisual content.

Semantic Media Explorer

Research Team

Supervisor: Prof. Dr. Christoph Meinel
Co-supervisor: PD Dr. habil. Haojin Yang
Jona Otholt (PhD student, G2-E.31)
Weixing Wang (PhD student, G2-E.32)
Hong Guo (PhD student, G2-E.31)

Former Team Members

Dr. Nianhui Guo (PhD student, now with GreenBitAI))
Dr. Ziyun Li (PhD student, now with KTH Royal Institute of Technology)
Dr. Joseph Bethge (PhD student, now with GreenBitAI)
Dr. Ting Hu (PhD student, now with GreenBitAI)
Dr. Christian Bartz (PhD student, now with German Aerospace Center)
Dr. Goncalo Mordido (PhD student, now with Huawei Research Irland)
Dr. Mina Razaei (PhD student, now with LMU)
Dr. Xiaoyin Che (PhD student and PostDoc researcher, now with Siemens Research China)
Dr. Cheng Wang (PhD student, now with Amazon AI)
Zi Yang (co-supervised PhD student with Prof. Guillermo Gallego TU-Berlin, now with GreenBitAI)
Gregor Nickel (PhD student)
Dimitri Korsch (master student, now PhD Student with Friedrich-Schiller-Universität Jena)
Hannes Rantzsch (master student, now with nexenio GmbH)
Tom Herold (master student, now with scalable minds)
Sheng Luo (PhD Student, now with Nvidia Shanghai)
Martin Fritzsche (master student)
Haofang Lu (PhD student)
Larissa Hoffäller (scientific coworker)
Julian Niedermeier (scientific coworker)
Jonathan Sauder (intern)
Benedikt Schenkel (scientific coworker)
Hendrik Rätz (PhD student)
Axel Stebner (scientific coworker)
Prashant Dangwal (scientific coworker)
Paul Mattes (scientific coworker)
Christopher Aust (scientific coworker)
Philipp Hildebrandt (scientific coworker)
Cedric Lorenz (scientific coworker)
Mohammad Yakub (scientific coworker)
Eszter Pai (scientific coworker)
Dimitrije Ristic (scientific coworker)
Maximilian Schulze (scientific coworker)
Elena Gensch (scientific coworker)
Tim Riedel (scientific coworker)
Lamin Touray (scientific coworker)

Reviewed Publications

Thesis:

Habilitationsschrift "Deep Representation Learning for Multimedia Data Analysis", Digital Engineering Fakultät, Uni Potsdam, 2019.
- Einreichung: Mittwoch, 10. Oktober 2018
- Kolloquium: "Deep Representation Learning for Multimedia Data Analysis", Termin: Freitag, 10. Mai 2019, 14.00 Uhr im HPI-HS2. [PDF]
- Probevorlesung: "A Concise History of Neural Networks", Termin: Freitag, 14. Juni 2019, 09.00 Uhr im HPI-HS3. [PDF]
Ph.D. thesis "Automatic Video Indexing and Retrieval Using Video OCR Technology", Hasso-Plattner-Institute (HPI), Uni Potsdam, 2013.
- Einreichung: 30. April 2013
- Kolloquium: 5. November 2013
- Note: "summa cum laude"
Diploma thesis "Musik Visualisierung mit Hilfe moderner Grafikprozessoren", Fraunhofer IDMT, TU-Ilmenau, 2008.

In Journal:

2024

Yang, Z., Yang, H., Majumder, S., Cardoso, J., & Gallego, G. Data Pruning Can Do More: A Comprehensive Data Pruning Approach for Object Re-identification. Transactions on Machine Learning Research (TMLR).
Hu, Ting, Christoph Meinel, and Haojin Yang. "A flexible BERT model enabling width-and depth-dynamic inference." Computer Speech & Language (2024): 101646. [PDF]
Nianhui Guo, Joseph Bethge, Hong Guo, Christoph Meinel, Haojin Yang, "Towards Optimization-Friendly Binary Neural Network", TMLR (Transactions on Machine Learning Research)

2023

Ziyun Li, Jona Otholt, Ben Dai, Di Hu, Christoph Meinel, Haojin Yang, "Supervised Knowledge May Hurt Novel Class Discovery Performance", TMLR (Transactions on Machine Learning Research)

2019

Mina Rezaei, Haojin Yang and Christoph Meinel, "Recurrent generative adversarial network for learning imbalanced medical image semantic segmentation", International Journal of Multimedia Tools and Applications (MTAP), Special Issue: "Deep Learning for Computer-aided Medical Diagnosis", Online first version: https://doi.org/10.1007/s11042-019-7305-1, 07 Feb. 2019 online version

2018

Cheng Wang, Haojin Yang and Christoph Meinel, "Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning", ACM Transactions on Multimedia Computing Communications and Applications (TOMM) 2018 [link][PDF][BibTex]

2017

Xiaoyin Che, Haojin Yang, Christoph Meinel, "Automatic Online Lecture Highlighting Based on Multimedia Analysis", IEEE Transactions on Learning Technologies (TLT), Publisher: IEEE Computer Society and IEEE Education Society 2017, Volume: PP, Issue: 99, DOI: 10.1109/TLT.2017.2716372, Print ISSN: 1939-1382 [citation] [PDF]

2016

Cheng Wang, Haojin Yang and Christoph Meinel, "A Deep Semantic Framework for Multimodal Representation Learning", International Journal of MULTIMEDIA TOOLS AND APPLICATIONS (MTAP), DOI: 10.1007/s11042-016-3380-8, online ISSN:1573-7721, Print ISSN:1380-7501, Special Issue: "Representation Learning for Multimedia Data Understanding", March 2016 [link] [PDF] [BibTex]

2014

Haojin Yang, Christoph Meinel, "Content Based Lecture Video Retrieval Using Speech and Video Text Information", IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES (TLT), DIO: 10.1109/TLT.2014.2307305, online ISSN: 1939-1382, pp. 142-154, volume 7, number 2, April-June 2014, Publisher: IEEE Computer Society and IEEE Education Society [citation BibTex] [PDF]

Xiaoyin Che, Haojin Yang, Christoph Meinel, "The Automated Generation and Further Application of Tree-Structure Outline for Lecture Videos with Synchronized Slides", International Journal of Technology and Educational Marketing, Volume 4, Number 1, 2014, IGI Global

2012

Haojin Yang, Bernhard Quehl and Harald Sack, "A Framework for Improved Video Text Detection and Recognition", International Journal of MULTIMEDIA TOOLS AND APPLICATIONS (MTAP), online ISSN:1573-7721, Print ISSN:1380-7501, online available Oct. 2012, special issue "Computer Vision for Multimedia", Volume 69 Number 1, pp 217-245. Publicher: Springer US, DOI: http://dx.doi.org/10.1007/s11042-012-1250-6 [citation BibTex] [PDF], 2014

Haojin Yang, Harald Sack, Christoph Meinel, "Lecture Video Indexing and Analysis Using Video OCR Technology", International Journal of Multimedia Processing and Technologies (JMPT), Volume: 2, Issue:4, pp. 176-196, Print ISSN: 0976-4127, Online ISSN: 0976-4135, Dec. 2011 [citation BibTex]

In Conference, Workshop and arXiv:

2025

Weixing Wang, Zifeng Ding, Jindong Gu, RUI CAO, Christoph Meinel, Gerard de Melo, Haojin Yang, "mage Token Matters: Mitigating Hallucination in Discrete Tokenizer-based Large Vision-Language Models via Latent Editing", The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), San Diego Convention Center and Mexico City, 2025

2024

Hong Guo, Nianhui Guo, Christoph Meinel and Haojin Yang, "Low-Bit CUTLASS GEMM Template Auto-Tuning Using Neural Network" In Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing with Applications (IEEE ISPA) China, 2024 (BEST PAPER AWARD)
Nianhui Guo, Hong Guo, Christoph Meinel and Haojin Yang, "Enhancing Optimization Robustness in 1-bit Neural Networks through Stochastic Sign Descent". In Proceedings of the 18th European Conference on Computer Vision (ECCV) 2024 at MiCo Milano. [PDF]
Ziyun Li, Ben Dai, Christoph Meinel, Haojin Yang, "Generalized Category Discovery on Imbalanced Data Distribution". In Proceedings of the International Joint Conference on Neural Networks (IJCNN), in YOKOHAMA, JAPAN, 2024
Guo, N., Bethge, J., Yang, H., Zhong, K., Ning, X., Meinel, C., & Wang, Y. (2024, February). BoolNet: Towards Energy-Efficient Binary Neural Networks Design and Optimization. In 2nd AAAI Workshop on Sustainable AI.
Wang, W., Yang, H., Meinel, C., Özkan, H. Y., Bermudez Serna, C., & Mas-Machuca, C. (2024). Feature Distribution Shift Mitigation with Contrastive Pretraining for Intrusion Detection. In Network Traffic Measurement and Analysis Conference.
Otholt, J., Meinel, C., & Yang, H. (2024). Guided Cluster Aggregation: A Hierarchical Approach to Generalized Category Discovery. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 2618-2627).

2023

Li, Z., Meinel, C., & Yang, H. (2023). Generalized Categories Discovery for Long-tailed Recognition. arXiv preprint arXiv:2401.05352.
Li, Z., Dai, B., Simsek, F., Meinel, C., & Yang, H. (2023). ImbaGCD: Imbalanced Generalized Category Discovery. arXiv preprint arXiv:2401.05353.
Hu, T., Meinel, C., & Yang, H. (2023). Scaled Prompt-Tuning for Few-Shot Natural Language Generation. arXiv preprint arXiv:2309.06759.
Hu, T., Meinel, C., & Yang, H. (2023, June). Flexible BERT with Width-and Depth-dynamic Inference. In 2023 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.
Li, Z., Wang, X., Robertson, N. M., Clifton, D. A., Meinel, C., & Yang, H. (2023, June). SMKD: Selective Mutual Knowledge Distillation. In 2023 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.
Hu, T., Meinel, C., & Yang, H. (2023, June). Boosting Bert Subnets with Neural Grafting. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE.
Simsek, F., Pfitzmann, B., Raetz, H., Otholt, J., Yang, H., & Meinel, C. (2023). DocLangID: Improving Few-Shot Training to Identify the Language of Historical Documents. arXiv preprint arXiv:2305.02208.

2022

Guo, N., Bethge, J., Meinel, C., & Yang, H. (2022). Join the High Accuracy Club on ImageNet with A Binary Neural Network Ticket. arXiv preprint arXiv:2211.12933. [pdf] [code]
Hu, T., Meinel, C., & Yang, H. (2022). Empirical Evaluation of Post-Training Quantization Methods for Language Tasks. arXiv preprint arXiv:2210.16621. [pdf]
Li, Z., Wang, X., Meinel, C., Robertson, N. M., Clifton, D. A., & Yang, H. (2022, October). Not all knowledge is created equal: mutual distillation of confident knowledge. In NeurIPS 2022 Workshop on Trustworthy and Socially Responsible Machine Learning.
Li, Z., Otholt, J., Dai, B., Meinel, C., & Yang, H. (2022). A Closer Look at Novel Class Discovery from the Labeled Set. arXiv preprint arXiv:2209.09120. [pdf] [code coming soon]
Bartz, C., Raetz, H., Otholt, J., Meinel, C., & Yang, H. (2022, August). Synthesis in Style: Semantic Segmentation of Historical Documents using Synthetic Data. In 2022 26th International Conference on Pattern Recognition (ICPR) (pp. 3878-3884). IEEE. [code] [pdf]

2021

Bartz, C., Bethge, J., Yang, H., & Meinel, C. (2020). One Model to Reconstruct Them All: A Novel Way to Use the Stochastic Noise in StyleGAN. The 32nd British Machine Vision Conference (BMVC), 22nd - 25th November 2021 [pdf][code]
Hu, Ting, Haojin Yang, and Christoph Meinel. "Denoising AutoEncoder Based Delete and Generate Approach for Text Style Transfer." International Conference on Artificial Neural Networks. Springer, Cham, 2021. [pdf]
N Guo, J Bethge, H Yang, K Zhong, X Ning, C Meinel, Y Wang. (2021). BoolNet: Minimizing The Energy Consumption of Binary Neural Networks. arXiv preprint arXiv:2106.06991 [pdf][code][video]
Li, Z., Wang, X., Yang, H., Hu, D., Robertson, N. M., Clifton, D. A., & Meinel, C. (2021). Not All Knowledge Is Created Equal. arXiv preprint arXiv:2106.01489. [pdf][code]
H Yang, Z Shen, Y Zhao, AsymmNet: Towards ultralight convolution neural networks using asymmetrical bottlenecks, MAI@CVPR 2021 [pdf][code][video]
G Mordido, H Yang, C Meinel, Evaluating Post-Training Compression in GANs using Locality-Sensitive Hashing, arXiv preprint arXiv:2103.11912 [pdf]
Bethge, J., Bartz, C., Yang, H., Meinel, C. An Improved Network Architecture for Binary Neural Networks, WACV 2021 [pdf] [code]

2020

Bethge, J., Bartz, C., Yang, H., Meinel, C. (2020). MeliusNet: Can Binary Neural Networks Achieve MobileNet-level Accuracy?. arXiv preprint arXiv:2001.05936. [pdf][code]
Bartz, C., Bethge, J., Yang, H., & Meinel, C. (2020). One Model to Reconstruct Them All: A Novel Way to Use the Stochastic Noise in StyleGAN. arXiv preprint arXiv:2010.11113. [pdf][code]
Bartz, C., Bethge, J., Yang, H., & Meinel, C. (2020). KISS: Keeping It Simple for Scene Text Recognition. arXiv preprint arXiv:1911.08400. [pdf][code]
G. Mordido, H. Yang and C. Meinel.: microbatchGAN: Stimulating Diversity with Multi-Adversarial Discrimination. In IEEE Winter Conference on Application Computer Vision (WACV’20), Snowmass village, Colorado, March 2-5, 2020
J Bethge, C Bartz, H Yang, C Meinel, BMXNet 2: An Open Source Framework for Low-bit Networks-Reproducing, Understanding, Designing and Showcasing. In Proceedings of the 28th ACM International Conference on Multimedia, 2020 [PDF][code]
Jonathan Sauder, Ting Hu, Xiaoyin Che, Gonçalo Mordido, Haojin Yang, Christoph Meinel, Best student forcing: A simple training mechanism in adversarial language generation. In Proceedings of The 12th Language Resources and Evaluation Conference, 2020. [PDF] [code]
Bartz, C., Seidel, L., Nguyen, D. H., Bethge, J., Yang, H., & Meinel, C. Synthetic Data for the Analysis of Archival Documents: Handwriting Determination. DICTA 2020.

2019

Bethge, J., Yang, H., Bornstein, M., & Meinel, C. BinaryDenseNet: Developing an Architecture for Binary Neural Networks. International Conference on Computer Vision (ICCV'19), Neural Architects'19, Oct. 27- Nov. 2 2019, Seoul, Korea
Bethge, J., Yang, H., Bornstein, M., & Meinel, C. Back to Simplicity: How to Train Accurate BNNs from Scratch?. arXiv preprint arXiv:1906.08637. [Demo] [code][pdf]
Joseph Bethge, Haojin Yang, Christoph Meinel, Training Accurate Binary Neural Networks From Scratch, In IEEE International Conference on Image Processing (ICIP'19) in Taipei, Taiwan, September 22-25, 2019
Mina Rezaei, Haojin Yang, Konstantine Harmuth, Christoph Meinel: Conditional Generative Adversarial Refinement Networks for Unbalanced Medical Image Semantic Segmentation. In IEEE Winter Conference on Application Computer Vision (WACV’19), pages:1836-1845, Waikoloa Village, HI, USA, January 7-11, 2019 [code]
Mina Rezaei, Haojin Yang, Christoph Meinel: Learning Imbalanced Semantic Segmentation through Cross-Domain Relations of Multi-Agent Generative Adversarial Networks. SPIE Medical Imaging - Computer Aided Diagnosis (SPIE’19), pages 1-6, San Diego, California, United States 16 - 21 February 2019

2018

Jonathan Sauder, Xiaoyin Che, Gonçalo Mordido, Haojin Yang and Christoph Meinel. Pseudo-Ground-Truth Training for Adversarial Text Generation with Reinforcement Learning. Deep Reinforcement Learning Workshop at NeurIPS 2018 (Deep RL workshop)
Mina Rezaei, Haojin Yang, Christoph Meinel: Generative Adversarial Framework for Learning Multiple Clinical Tasks. Accepted by Machine Learning for Health Workshop at NeurIPS 2018 (ML4H)
Mina Rezaei, Haojin Yang and Christoph Meinel, Generative Adversarial Framework for Learning Multiple Clinical Tasks. Digital Image Computing: Techniques and Applications (DICTA 2018)
Christian Bartz, Haojin Yang, Joseph Bethge and Christoph Meinel. LoANs: Weakly Supervised Object Detection with Localizer Assessor Networks. 1st International Workshop on Advanced Machine Vision for Real-life and Industrially Relevant Applications" (AMV 2018), in conjunction with the "Asian Conference on Computer Vision" (ACCV) 2-6 December 2018, in Perth, Australia
Mina Rezaei, Haojin Yang, Christoph Meinel: voxel-GAN: Adversarial Framework for Learning Imbalanced Brain Tumor Segmentation. Accepted by BrainLes@MICCAI 2018, code)
G. Mordido, H. Yang and C. Meinel. Dropout-GAN: Learning from a Dynamic Ensemble of Discriminators. ACM KDD'18 Deep Learning Day (KDD DLDay 2018), London UK, 2018 [PDF]
Mina Rezaei, Haojin Yang and Christoph Meinel "Instance Tumor Segmentation using Multitask Convolutional Neural Network" International Joint Conference on Neural Networks (IJCNN) 2018
Mina Rezaei, Haojin Yang, Christoph Meinel "Whole Heart and Great Vessel Segmentation with Context-aware of Generative Adversarial Networks" Bildverarbeitung für die Medizin (BVM) 2018
Mina Rezaei, Haojin Yang, Christoph Meinel, "Automatic Cardiac MRI Segmentation via Context-aware Recurrent Generative Adversarial Neural Network", Computer Assisted Radiology and Surgery (CARS 2018)

2017

Chrisitian Bartz, Haojin Yang, Christoph Meinel "SEE: Towards Semi-Supervised End-to-End Scene text Recognition", the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), February 2–7, 2018 New Orleans, Lousiana, USA (PDF) (codes)

Chrisitian Bartz, Haojin Yang, Christoph Meinel “STN-OCR: A single Neural Network for Text Detection and Text Recognition”, arXiv:1707.08831v1 2017 (codes)

Christian Bartz, Tom Herold, Haojin Yang and Christoph Meinel "Language Identification Using Deep Convolutional Recurrent Neural Networks", 24th International Conference on Neural Information Processing (ICONIP 2017), November 14-18, 2017, Guangzhou, China

Mina Rezaei, Haojin Yang and Christoph Meinel "Deep Neural Network with l2-norm Unit for Brain Lesions Detection", 24th International Conference on Neural Information Processing (ICONIP 2017), November 14-18, 2017, Guangzhou, China

Xiaoyin Che, Nico Ring, Willi Raschkowski, Haojin Yang and Christoph Meinel, "Traversal-Free Word Vector Evaluation in Analogy Space", RepEval workshop at EMNLP 17 (Empirical Methods in Natural Language Processing), September 7–11, 2017, Copenhagen, Denmark. [PDF copy]

Haojin Yang, Martin Fritzsche, Christian Bartz, Christoph Meinel, "BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet" ACM International Conference on Multimedia (ACM MM 2017), Open Source Software Competition, October 23-27, 2017, Mountain View, CA USA. [PDF] [project][Amazon AI Blog]

Xiaoyin Che, Nico Ring, Willi Raschkowski, Haojin Yang and Christoph Meinel "Automatic Lecture Subtitle Generation and How It Helps", 17th IEEE International Conference on Advanced Learning Technologies (ICALT 2017), July 3-7, 2017, Timisoara, Romania. [PDF copy][BibTex]

2016

Haojin Yang, Cheng Wang, Christian Bartz, Christoph Meinel "SceneTextReg: A Real-Time Video OCR System", ACM international conference on Multimedia (ACM MM 2016), system demonstration session, 15-19 October 2016, Amsterdam, The Netherlands [PDF copy][demo video] [BibTex]

Cheng Wang, Haojin Yang, Christian Bartz, Christoph Meinel "Image Captioning with Deep Bidirectional LSTMs", ACM international conference on Multimedia (ACM MM 2016), full paper in the deep learning session of the main conference track, 15-19 October 2016, Amsterdam, The Netherlands [PDF copy] [demo video]

Xiaoyin Che, Cheng Wang, Haojin Yang and Christoph Meinel, "Punctuation Prediction for Unsegmented Transcript Based on Word Vector", "the 10th International Conference on Language Resources and Evaluation (LREC 2016)", Portorož (Slovenia), 23-28 May 2016 [Dataset]

Haojin Yang, "Real-Time Video OCR System", system demonstration at 41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), Show&Tell session, Shanghai China, 20-25 March 2016

Cheng Wang, Haojin Yang and Christoph Meinel, "Exploring Multimodal Video Representation for Action Recognition", the annual International Joint Conference on Neural Networks (IJCNN 2016), Vancouver, Canada, July 24-29, 2016

Xiaoyin Che, Thomas Staubitz, Haojin Yang and Christoph Meinel, "Pre-Course Key Segment Analysis of Online Lecture Videos", 16th IEEE International Conference on Advancing Learning Technologies (ICALT-2016), Austin, Texas, USA, July 25-28, 2016

Xiaoyin Che, Sheng Luo, Haojin Yang and Christoph Meinel, "Sentence Boundary Detection Based on Parallel Lexical and Acoustic Models", INTERSPEECH 2016, San Francisco, California, USA in September 8-12, 2016

Sheng Luo, Haojin Yang, Cheng Wang, Xiaoyin Che, and Christoph Meinel, "Action Recognition in Surveillance Video Using ConvNets and Motion History Image", International Conference on Artificial Neural Networks (ICANN 2016), Barcelona Spain, 6th-9th of September 2016

Sheng Luo, Haojin Yang, Cheng Wang, Xiaoyin Che and Christoph Meinel, "Real-time action recognition in surveillance videos using ConvNets", in the 23rd International Conference on Neural Information Processing (ICONIP 2016), in Kyoto (Japan), 16th-21th of October 2016

Hannes Rantzsch, Haojin Yang and Christoph Meinel "Signature Embedding: Writer Independent Offline Signature Verification with Deep Metric Learning" in 12th International Symposium on Visual Computing (ISVC'16), Las Vegas USA, December 12-14, 2016. [PDF copy] [Poster]

Xiaoyin Che, Sheng Luo, Haojin Yang, Christoph Meinel "Sentence-Level Automatic Lecture Highlighting Based on Acoustic Analysis" 16th IEEE International Conference on Computer and Information Technology (IEEE CIT 2016), Shangri-La's Fijian Resort, Fiji, 7-10 December 2016

2015

Cheng Wang, Haojin Yang, Xiaoyin Che and Christoph Meinel, "Concept-Based Multimodal Learning for Topic Generation", the 21st MultiMedia Modelling Conference (MMM2015), Sydney, Australia, Jan 5-7, 2015

Sheng Luo, Haojin Yang and Christoph Meinel, "Reward-based Intermittent Reinforcement in Gamification for E-learning", 7th International Conference on Computer Supported Education (CSEDU), Lisbon, Portugal, Mai 23-25, 2015

H.J.Yang, C.Wang, X.Y.Che, S.Luo and Ch.Meinel. “An Improved System For Real-Time Scene Text Recognition”, ACM International Conference on Multimedia Retrieval (ICMR 2015), system demonstration session, Shanghai, June 23-26, 2015

Cheng Wang, Haojin Yang and Christoph Meinel, "Does Multilevel Semantic Representation Improve Text Categorization?", the 26th International Conference on Database and Expert Systems Applications (DEXA 2015), Valencia, Spain, September 1-4, 2015

Cheng Wang, Haojin Yang and Christoph Meinel, "Visual-Textual Late Semantic Fusion Using Deep Neural Network for Document Categorization", the 22nd International Conference on Neural Information Processing (ICONIP2015), Istanbul, Turkey, November 9-12, 2015

Cheng Wang, Haojin Yang, Christoph Meinel, "Deep Semantic Mapping for Cross-Modal Retrieval", the 27th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2015), Vietri sul Mare, Italy, November 9-11, 2015

Xiaoyin Che, Haojin Yang and Christoph Meinel, "Adaptive E-Lecture Video Outline Extraction Based on Slides Analysis", the 14th International Conference on Web-based Learning (ICWL 2015), Guangzhou, China, November 5-8, 2015

Xiaoyin Che, Haojin Yang and Christoph Meinel, "Table Detection from Slide Images", 7th Pacific Rim Symposium on Image and Video Technology (PSIVT2015), 23-27 November, 2015, Auckland, New Zealand

2014

Bernhard Quehl, Haojin Yang and Harald Sack, "Improving text recognition by distinguishing scene and overlay text", the 7th International Conference on Machine Vision (ICMV 2014), Milan, Italy, November 19-21, 2014

2013

Xiaoyin Che, Haojin Yang, Christoph Meinel, "Lecture Video Segmentation by Automatically Analyzing the Synchronized Slides", The 21st ACM International Conference on Multimedia (ACM MM13), Grand Challenge: "Temporal Segmentation and Annotation Grand Challenge" October 21-25, 2013, Barcelona, Spain. [copy PDF]

Xiaoyin Che, Haojin Yang, Christoph Meinel, "Tree-Structure Outline Generation for Lecture Videos with Synchronized Slides", The Second International Conference on E-Learning and E-Technologies in Education (ICEEE2013), 23-25th September 2013, Lodz Poland. [copy PDF]

Franka Grünewald, Haojin Yang, Christoph Meinel, "Evaluating the Digital Manuscript Functionality - User Testing For Lecture Video Annotation Features", 12th International Conference on Web-based Learning (ICWL 2013), 6 - 9th October 2013, Kenting, Taiwan. Springer lecture notes, 2013. (best student paper award) [copy PDF]

Haojin Yang, Franka Grünewald, Matthias Bauer, Christoph Meinel, "Lecture Video Browsing Using Multimodal Information Resources", 12th International Conference on Web-based Learning (ICWL 2013), 6 - 9th October 2013, Kenting, Taiwan. Springer lecture notes.

Franka Grünewald, Haojin Yang, Elnaz Mazandarani, Matthias Bauer and Christoph Meinel, "Next Generation Tele-Teaching: Latest Recording Tech- nology, User Engagement and Automatic Metadata Retrieval", International Conference on Human Factors in Computing and Informatics (southCHI), Lecture Notes in Computer Science (LNCS) Springer, 01–03 July, 2013 Maribor, Slovenia

2012

Haojin Yang, Christoph Oehlke and Christoph Meinel, "An Automated Analysis and Indexing Framework for Lecture Video Portal", 11th International Conference on Web-based Learning (ICWL 2012), 2 - 4th September 2012, Sinaia, Romania. Springer lecture notes, Volume 7558, 2012. [citation BibTex](accept rate:26%)(best student paper award)

Haojin Yang, Bernhard Quehl, Harald Sack, "A skeleton based binarization approach for video text recognition", 13th International Workshop on Image analysis for multimedia interactive services (WIAMIS 2012), 23rd - 25th May 2012, IEEE Press, Dublin Ireland. [poster] [citation BibTex]

C. Hentschel, J. Hercher, M. Knuth, J. Osterhoff, B. Quehl, H. Sack, N. Steinmetz, J. Waitelonis, H-J.Yang:
"Open Up Cultural Heritage in Video Archives with Mediaglobe", 12th International Conference on Innovative Internet Community Services (I2CS 2012), June 13-15, 2012, Trondheim (Norway) [citation BibTex] (best paper award)

Haojin Yang, Franka Gruenewald and Christoph Meinel, "Automated extraction of lecture outlines from lecture videos: a hybrid solution for lecture video indexing", 4th International Conference on Computer Supported Education (CSEDU 2012) (indexation by Thomson Reuters Conference Proceedings Citation Index (ISI) and Elsevier Index (EI)), SciTePress, April. 16-18, 2012, Porto Portugal [citation BibTex] (accept rate: 12%)

Haojin Yang, Bernhard Quehl and Harald Sack, "Text detection in video images using adaptive edge detection and stroke width verification", 19th International Conference on Systems, Signals and Image Processing (IWSSIP 2012), IEEE Press, Vienna, Austria, April. 11-13, 2012 [citation BibTex]

2011

Haojin Yang, Maria Siebert, Patrick Lühne, Harald Sack and Christoph Meinel, "Lecture Video Indexing and Analysis Using Video OCR Technology", 7th International Conference on Signal Image Technology and Internet Based Systems (SITIS 2011), Track Internet Based Computing and Systems, IEEE Press, Dijon (France), Nov.28 - Dec. 1, 2011. [citation BibTex]

Haojin Yang, Maria Siebert, Patrick Lühne, Harald Sack and Christoph Meinel, "Automatic Lecture Video Indexing Using Video OCR Technology" IEEE International Symposium on Multimedia 2011 (ISM 2011), IEEE Press, Dana Point, CA, USA, Dec. 5-7, 2011. [citation BibTex]

Haojin Yang, Christoph Oehlke and Christoph Meinel, "A Solution for German Speech Recognition for Analysis and Processing of Lecture Videos" 10th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2011) , IEEE Press, Sanya, Heinan Island, China, May 2011 [citation BibTex]

Teaching

Concluded PhD Theses

Dr. Nianhui Guo: "Towards Efficient Ultra Low-bitwidth Neural Networks: A Systematic Study of Architecture Design, Training Optimization and Deployment" 2025
Dr. Ziyun Li: "Navigating Data Limitations in Open World: Exhaustiveness, Balance, and Correctness" 2024
Dr. Joseph Bethge: "Binary Neural Networks: Improving the World of Neural Networks Bit by Bit" 2024
Dr. Ting Hu: "Towards Effective and Efficient Language Models: RNN-based Generative Model Enhancements, Transfer Learning, and Inference Optimization" 2024
Dr. Christian Barz: "Reducing the Annotation Burden: Deep Learning for Optical Character Recognition using less Manual Annotations" 2022
Dr. Goncalo Mordido: "Diversification, Compression, and Evaluation Methods for Generative Adversarial Networks" 2021
Dr. Mina Rezaei: "Deep representation Learning from Imbalanced Medical Imaging" 2020
Dr. Xiaoyin Che: "E-Lecture Material Enhancement Based on Automatic Multimedia Analysis" 2019
Dr. Cheng Wang: "Deep Learning of Multimodal Representations" 2018

Current Master Theses

Concluded Master Thesis

Furkan Simsek, "LTGCD: Long-tailed Generalized Category Discovery", 2023
Weixing Wang, "Network Intrusion Detection using pre-trained tabular representation models", co-supervision with Prof. Wolfgang Kellerer from TUM,2023
Jonas Krah, "Accelerating Monocular Depth Estimation using Binary Neural Networks", 2023
Tobias Bredow, "Synthetic Data for the Segmentation of Medical Images", 2022
Alexander Kromer, "Quantized Ensemble Neural Networks", 2022
Erik Ziegler, "Multi-Task and Zero-Shot Learning with Question Answering Transformer Models", 2022
Emanuel Metzenthin, "Weakly Supervised Text Localization using Deep Reinforcement Learning", 2022
Jona Otholt, "Automatic Categorization of Scanned Documents" 2021
Rätz, Hendrik "Handwriting Classification on Archival Documents using Deep Neural Networks", 2020
Julian Niedermeier, "Manifold Learning for the Evaluation of Generative Models" 2019
Wolff, Felix "Online Activity Prediction with Long-short-term Memory Recurrent Networks" (co-supervision with Prof. Mathias Weske and Dr. Luise Pufahl), 2019
Loy, Adrian "Adaptive Precision of Deep Neural Networks", 2019
Bornstein, Marvin "Evaluation of Quantized Deep Neural Networks", 2019
Meyer, Thorben "Handwriting Detection/Recognition from Art-Historical Documents", 2018
Tom Herold: "Language identification in audio files using deep learning", 2017
Martin Fritzsche: "Quantized Deep Neural Networks", 2017
Hannes Rantzsch: "A deep learning approach to signature verification", 2016
Dimitri Korsch: "Perspective recification of scene text with the help of analytical and deep learning approaches", 2016
Christian Bartz: "Scene text recognition using deep learning", 2016

Lecture

WS 2023/2024

Master Seminar: Machine Intelligence with Deep Learning

SS 2023

Master Seminar: Practical Applications of Deep Learning

WS 2022/2023

Master Seminar: Machine Intelligence with Deep Learning

SS 2022

Master Seminar: Practical Applications of Deep Learning

WS 2021/2022

Master Seminar: Machine Intelligence with Deep Learning

SS 2021

Master Seminar: Practical Applications of Deep Learning

WS 2020/2021

Master Seminar "Machine Intelligence with Deep Learning"

SS 2019

Master Seminar: Practical Applications of Deep Learning

WS 2018/2019

Master Seminar "Machine Intelligence with Deep Learning"

SS 2018

Master Lecture/Project "Competitive Problem Solving with Deep Learning"

WS 2017/2018

Master Seminar "Machine Intelligence with Deep Learning"
Master Projekt "Nature Language Generation Using Generative Adversarial Networks"

SS 2017

Master Seminar "Practical Video Analysis"
Bachelor Seminar: Weiterführende Themen zu Internet- und WWW-Technologien

WS 2016/2017

Master Seminar "Practical Applications of Multimedia Retrieval"

SS 2016

Master Seminar "Practical video analysis"
Bachelor Seminar: Weiterführende Themen zu Internet- und WWW-Technologien

WS 2015/2016

Master Seminar "Practical Applications of Multimedia Retrieval"

SS 2015

Master Seminar: Practical video analysis
Master Project: Video Classification with Convolutional Neural Networks
Bachelor Seminar: Weiterführende Themen zu Internet- und WWW-Technologien

SS 2014

Master Seminar: Weiterführende Themen zu Internet- und WWW-Technologien

SS 2013

Bachelor Seminar: Weiterführende Themen zu Internet- und WWW-Technologien

WS 2012/2013

Bachelor Seminar: Web-Programmierung und Web-Frameworks

SS 2012

Bachelor seminar: Multimedia Analysis
Bachelorprojekt: "tele-TASK for Kids - Integration von tele-TASK in den Schulalltag"

WS 2011/12

Master seminar: Large Scale Processing for Multimedia Analysis

SS 2011

Bachelor seminar: Multimedia Analysis

SS 2010

Bachelor seminar: Multimedia Analysis

Professional Activities

Program Committee Member and Reviewer

ICML 2024
ICLR 2024
NeurIPS 2023
ICML 2023
ICLR 2023
AAAI 2023 (Senior PC)
NeurIPS 2022
ICML 2022 (Senior PC)
ICLR 2022 (Highlighted Reviewer)
NeurIPS 2021
ICCV 2021
ICML 2021
CVPR 2021
NeurlPS 2020 (top 10% high scored reviewer)
ICML 2020
NeurlPS 2019 (top 35% high scored reviewer)
NLPCC Workshop on Explainable Artificial Intelligence 2019
IEEE Transactions on Neural Networks and Learning Systems 2019
IEEE Transactions on Multimedia 2018
International Journal on Signal Processing: Image Communication 2017
ACM Computing Surveys 2017
IEEE Transactions on Multimedia 2016
Neurocomputing 2016
Computer Vision and Image Understanding 2016
Computer Vision and Image Understanding 2015
Neurocomputing 2015
IET Image Processing 2015
IEEE Transactions on Image Processing 2015
ACM International Conference on LEARNING @SCALE 2014
African Journal of Business Management
Computer Vision and Image Understanding 2014
Neurocomputing 2014
ACM ICMR 2013
ICIP 2013
ICIP 2012