• de

Dr. Haojin Yang

Head of Multimedia Analysis & Deep Learning Research Team
Senior Researcher at HPI

Hasso-Plattner-Institut für
Softwaresystemtechnik GmbH, Universität Potsdam
Prof.-Dr.-Helmert-Str. 2-3
D-14482 Potsdam

office:      H-1.22
phone:     +49 (0)331-5509-511
fax:         +49 (0)331-5509-325
email:     haojin.yang@hpi.de

Research Interests

Multimedia Analysis and Modeling, Deep Learning, Multimodal Video Representation, Computer Vision, Lecture Video.

Current Projects

Multimedia Analysis with Deep Learning

Learning and understanding multimedia content is a challenging task in the research field of information retrieval and multimedia analysis. Deep Learning (DL), as a new area of machine learning (since 2006), has already been impacting a wide range of multimedia information processing. Recently, the techniques developed based on DL achieved substantial progress in fields including Speech Recognition, Image Classification and Language Processing. It has been proved that through simulating human neural network and hierarchically (layer-by-layer) learning features from large scale data can significantly improve analytic results. In this project, we focus on developing multimedia retrieval approaches based on DL technologies. 

Current research topics:

Multimedia analysis and computer vision based on Deep Learning and intelligent data synthesis

  • End-to-end scene text detection and recognition in real-time using deep neural networks
  • Neural visual translator: image/video captioning
  • Human action recognition, event detection in surveillance video
  • Multimodal data retrieval with deep neural networks
  • Semantic text analysis using Word2Vec and ConvNet (e.g. sentence boundary detection in speech transcript)
  • Deep Learning in medical image processing e.g. brain abnormality detection

Research in deep learning algorithms

  • Binary Neural Networks: enable deep learning models on low power devices
  • Generative model learning

Lecture video analysis

Video Lecture Browser: Lecture video content analysis, automatic video indexing, content-based video search, lecture speech recognition, lecture slides recognition etc.


 tele-TASK:(tele-Teaching Anywhere Solution Kit) is an advanced mobile system for the production of Internet streaming videos and podcasts featuring a new and drastically simplified technology.

Deep Learning for Enterprise NLP Applications

Project partner: SAP AG

In this project we will develop a framework for building general-applicable as well as domain-specific word vectors by using state-of-the-art deep learning technology. The research problem on textual representation learning will be studied intended to find the most efficient solution for deep neural network design, and system implementation. The evaluation protocol will be defined and developed for the qualitative and quantitative evaluation. Based on possible labelled data provided, the combination of word vectors trained and labelled datasets would facilitate various NLP applications, such as in-domain keyword disambiguation, general sentiment analysis, customer satisfaction analysis, user query pre-classification etc.

Former Projects

 MEDIAGLOBE - the digital archive is part of the THESEUS research program initiated by the German Federal Ministry of Economy and Technology (BMWi). MEDIAGLOBE deals with digitization, analysis, and semantic retrieval of historical, documentary audiovisual content.

Research Team

  • Dr. Haojin Yang (Senior Researcher, H-1.22)
  • Xiaoyin Che (PhD Student, H-1.22)
  • Cheng Wang (PhD Student, H-1.22)
  • Christian Bartz (Phd Student H-1.11)
  • Sheng Luo (PhD Student H-1.21)
  • Mina Razaei (PhD Student H-1.22)
  • Martin Fritzsche (Scientific coworker and Master Thesis)
  • Larrisa Hoffäller(Scientific coworker)
  • Marco Schaarschmidt(Scientific coworker)
  • Tom Herold (Master Thesis)

Former Team Members

  • Dimitri Korsch (now PhD Student with Friedrich-Schiller-Universität Jena)
  • Hannes Rantzsch (now with nexenio GmbH)

Group photo

Job Offers

  • Ph.D. Scholarship [PDF]

Reviewed Publications


  • PhD thesis "Automatic Video Indexing and Retrieval Using Video OCR Technology", Hasso-Plattner-Institute (HPI), Uni-Potsdam, 2013. ("summa cum laude")
  • Diploma thesis "Musik Visualisierung mit Hilfe moderner Grafikprozessoren",  Fraunhofer IDMT, TU-Ilmenau, 2008.

In Journal:


  • Cheng Wang, Haojin Yang and Christoph Meinel, "Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning", ACM Transactions on Multimedia Computing Communications and Applications (TOMM) 2017 [link][PDF][BibTex] (accepted)
  • Xiaoyin Che, Haojin Yang, Christoph Meinel, "Automatic Online Lecture Highlighting Based on Multimedia Analysis", IEEE Transactions on Learning Technologies (TLT), Publisher: IEEE Computer Society and IEEE Education Society 2017 [citation BibTex] [PDF] (accepted)


  • Cheng Wang, Haojin Yang and Christoph Meinel, "A Deep Semantic Framework for Multimodal Representation Learning", International Journal of MULTIMEDIA TOOLS AND APPLICATIONS (MTAP), DOI: 10.1007/s11042-016-3380-8, online ISSN:1573-7721, Print ISSN:1380-7501,  Special Issue: "Representation Learning for Multimedia Data Understanding", March 2016 [link] [PDF] [BibTex]


  • Haojin Yang, Christoph Meinel, "Content Based Lecture Video Retrieval Using Speech and Video Text Information", IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES (TLT), DIO: 10.1109/TLT.2014.2307305, online ISSN: 1939-1382, pp. 142-154, volume 7, number 2, April-June 2014, Publisher: IEEE Computer Society and IEEE Education Society [citation BibTex] [PDF]


In Conferences: 


  • Haojin Yang, Martin Fritzsche, Christian Bartz, Christoph Meinel, "BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet" [source zip][PDF copy][project]
  • Xiaoyin Che, Sheng Luo, Haojin Yang and Christoph Meinel "Automatic Lecture Subtitle Generation and How It Helps", 17th IEEE International Conference on Advanced Learning Technologies (ICALT 2017), July 3-7, 2017, Timisoara, Romania. [PDF copy][BibTex]


  • Haojin Yang, Cheng Wang, Christian Bartz, Christoph Meinel "SceneTextReg: A Real-Time Video OCR System", ACM international conference on Multimedia (ACM MM 2016), system demonstration session, 15-19 October 2016, Amsterdam, The Netherlands [PDF copy][demo video] [BibTex]
  • Cheng Wang, Haojin Yang, Christian Bartz, Christoph Meinel "Image Captioning with Deep Bidirectional LSTMs", ACM international conference on Multimedia (ACM MM 2016), full paper in the deep learning session of the main conference track, 15-19 October 2016, Amsterdam, The Netherlands [PDF copy] [demo video
  • Xiaoyin Che, Cheng Wang, Haojin Yang and Christoph Meinel, "Punctuation Prediction for Unsegmented Transcript Based on Word Vector", "the 10th International Conference on Language Resources and Evaluation (LREC 2016)", Portorož (Slovenia), 23-28 May 2016
  • Haojin Yang, "Real-Time Video OCR System", system demonstration at 41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), Show&Tell session, Shanghai China, 20-25 March 2016
  • Cheng Wang, Haojin Yang and Christoph Meinel, "Exploring Multimodal Video Representation for Action Recognition", the annual International Joint Conference on Neural Networks (IJCNN 2016), Vancouver, Canada, July 24-29, 2016
  • Xiaoyin Che, Thomas Staubitz, Haojin Yang and Christoph Meinel, "Pre-Course Key Segment Analysis of Online Lecture Videos", 16th IEEE International Conference on Advancing Learning Technologies (ICALT-2016), Austin, Texas, USA, July 25-28, 2016
  • Xiaoyin Che, Sheng Luo, Haojin Yang and Christoph Meinel, "Sentence Boundary Detection Based on Parallel Lexical and Acoustic Models", INTERSPEECH 2016, San Francisco, California, USA in September 8-12, 2016 
  • Sheng Luo, Haojin Yang, Cheng Wang, Xiaoyin Che, and Christoph Meinel, "Action Recognition in Surveillance Video Using ConvNets and Motion History Image", International Conference on Artificial Neural Networks (ICANN 2016), Barcelona Spain, 6th-9th of September 2016 
  • Sheng Luo, Haojin Yang, Cheng Wang, Xiaoyin Che and Christoph Meinel, "Real-time action recognition in surveillance videos using ConvNets", in the 23rd International Conference on Neural Information Processing (ICONIP 2016), in Kyoto (Japan), 16th-21th of October 2016
  • Hannes Rantzsch, Haojin Yang and Christoph Meinel "Signature Embedding: Writer Independent Offline Signature Verification with Deep Metric Learning" in 12th International Symposium on Visual Computing (ISVC'16), Las Vegas USA, December 12-14, 2016. [PDF copy] [Poster]
  • Xiaoyin Che, Sheng Luo, Haojin Yang, Christoph Meinel "Sentence-Level Automatic Lecture Highlighting Based on Acoustic Analysis" 16th IEEE International Conference on Computer and Information Technology (IEEE CIT 2016), Shangri-La's Fijian Resort, Fiji, 7-10 December 2016


  • Cheng Wang, Haojin Yang, Xiaoyin Che and Christoph Meinel, "Concept-Based Multimodal Learning for Topic Generation", the 21st MultiMedia Modelling Conference (MMM2015), Sydney, Australia, Jan 5-7, 2015
  • Sheng Luo, Haojin Yang and Christoph Meinel, "Reward-based Intermittent Reinforcement in Gamification for E-learning", 7th International Conference on Computer Supported Education (CSEDU), Lisbon, Portugal, Mai 23-25, 2015
  • H.J.Yang, C.Wang, X.Y.Che, S.Luo and Ch.Meinel. “An Improved System For Real-Time Scene Text Recognition”, ACM International Conference on Multimedia Retrieval (ICMR 2015), system demonstration session, Shanghai, June 23-26, 2015
  • Cheng Wang, Haojin Yang and Christoph Meinel, "Does Multilevel Semantic Representation Improve Text Categorization?", the 26th International Conference on Database and Expert Systems Applications (DEXA 2015), Valencia, Spain, September 1-4, 2015
  • Cheng Wang, Haojin Yang and Christoph Meinel, "Visual-Textual Late Semantic Fusion Using Deep Neural Network for Document Categorization",  the 22nd International Conference on Neural Information Processing (ICONIP2015), Istanbul, Turkey, November 9-12, 2015
  • Cheng Wang, Haojin Yang, Christoph Meinel, "Deep Semantic Mapping for Cross-Modal Retrieval",  the 27th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2015), Vietri sul Mare, Italy, November 9-11, 2015
  • Xiaoyin Che, Haojin Yang and Christoph Meinel, "Adaptive E-Lecture Video Outline Extraction Based on Slides Analysis", the 14th International Conference on Web-based Learning (ICWL 2015), Guangzhou, China, November 5-8, 2015
  • Xiaoyin Che, Haojin Yang and Christoph Meinel, "Table Detection from Slide Images", 7th Pacific Rim Symposium on Image and Video Technology (PSIVT2015), 23-27 November, 2015, Auckland, New Zealand


  • Bernhard Quehl, Haojin Yang and Harald Sack, "Improving text recognition by distinguishing scene and overlay text", the 7th International Conference on Machine Vision (ICMV 2014), Milan, Italy, November 19-21, 2014


  • Xiaoyin Che, Haojin Yang, Christoph Meinel, "Lecture Video Segmentation by Automatically Analyzing the Synchronized Slides", The 21st ACM International Conference on Multimedia (ACM MM13), Grand Challenge: "Temporal Segmentation and Annotation Grand Challenge" October 21-25, 2013, Barcelona, Spain. [copy PDF]
  • Xiaoyin Che, Haojin Yang, Christoph Meinel, "Tree-Structure Outline Generation for Lecture Videos with Synchronized Slides", The Second International Conference on E-Learning and E-Technologies in Education (ICEEE2013), 23-25th September 2013, Lodz Poland. [copy PDF]
  • Franka Grünewald, Haojin Yang, Christoph Meinel, "Evaluating the Digital Manuscript Functionality - User Testing For Lecture Video Annotation Features", 12th International Conference on Web-based Learning (ICWL 2013), 6 - 9th October 2013,  Kenting, Taiwan. Springer lecture notes, 2013. (best student paper award) [copy PDF]
  • Haojin Yang, Franka Grünewald, Matthias Bauer, Christoph Meinel, "Lecture Video Browsing Using Multimodal Information Resources", 12th International Conference on Web-based Learning (ICWL 2013), 6 - 9th October 2013, Kenting, Taiwan. Springer lecture notes.
  • Franka Grünewald, Haojin Yang, Elnaz Mazandarani, Matthias Bauer and Christoph Meinel, "Next Generation Tele-Teaching: Latest Recording Tech- nology, User Engagement and Automatic Metadata Retrieval", International Conference on Human Factors in Computing and Informatics (southCHI), Lecture Notes in Computer Science (LNCS) Springer, 01–03 July, 2013 Maribor, Slovenia


  • Haojin Yang, Christoph Oehlke and Christoph Meinel, "An Automated Analysis and Indexing Framework for Lecture Video Portal", 11th International Conference on Web-based Learning (ICWL 2012), 2 - 4th September 2012,  Sinaia, Romania. Springer lecture notes, Volume 7558, 2012. [citation BibTex](accept rate:26%)(best student paper award)
  • Haojin Yang, Bernhard Quehl, Harald Sack, "A skeleton based binarization approach for video text recognition", 13th International Workshop on Image analysis for multimedia interactive services (WIAMIS 2012), 23rd - 25th May 2012, IEEE Press, Dublin Ireland. [poster] [citation BibTex]
  • C. Hentschel, J. Hercher, M. Knuth, J. Osterhoff, B. Quehl, H. Sack, N. Steinmetz, J. Waitelonis, H-J.Yang:
    "Open Up Cultural Heritage in Video Archives with Mediaglobe", 12th International Conference on Innovative Internet Community Services (I2CS 2012), June 13-15, 2012, Trondheim (Norway) [citation BibTex] (best paper award)
  • Haojin Yang, Franka Gruenewald and Christoph Meinel, "Automated extraction of lecture outlines from lecture videos: a hybrid solution for lecture video indexing", 4th International Conference on Computer Supported Education (CSEDU 2012) (indexation by Thomson Reuters Conference Proceedings Citation Index (ISI) and Elsevier Index (EI)), SciTePress, April. 16-18, 2012, Porto Portugal [citation BibTex] (accept rate: 12%)
  • Haojin Yang, Bernhard Quehl and Harald Sack, "Text detection in video images using adaptive edge detection and stroke width verification", 19th International Conference on Systems, Signals and Image Processing (IWSSIP 2012), IEEE Press, Vienna, Austria, April. 11-13, 2012 [citation BibTex]


  • Haojin Yang, Maria Siebert, Patrick Lühne, Harald Sack and Christoph Meinel, "Lecture Video Indexing and Analysis Using Video OCR Technology", 7th International Conference on Signal Image Technology and Internet Based Systems (SITIS 2011), Track Internet Based Computing and Systems, IEEE Press, Dijon (France), Nov.28 - Dec. 1, 2011. [citation BibTex]
  • Haojin Yang, Maria Siebert, Patrick Lühne, Harald Sack and Christoph Meinel, "Automatic Lecture Video Indexing Using Video OCR Technology" IEEE International Symposium on Multimedia 2011 (ISM 2011), IEEE Press, Dana Point, CA, USA, Dec. 5-7, 2011. [citation BibTex]
  • Haojin Yang, Christoph Oehlke and Christoph Meinel, "A Solution for German Speech Recognition for Analysis and Processing of Lecture Videos" 10th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2011) , IEEE Press, Sanya, Heinan Island, China, May 2011 [citation BibTex]



Current Master Theses

  • Tom Herold: "Language identification in audio files using deep learning"
  • Martin Fritzsche: "Quantized Deep Neural Networks"

Former Master Thesis

  • Christian Bartz: "Scene text recognition using deep learning", 2016 (now PhD Student with HPI)
  • Dimitri Korsch: "Perspective recification of scene text with the help of analytical and deep learning approaches", 2016 (now PhD Student with Friedrich-Schiller-Universität Jena)
  • Hannes Rantzsch: "A deep learning approach to signature verification", 2016

Reviewed Master Thesis

  • Dominik Müller: "Analyzing Neuroevolution Algorithms", 2016

SS 2017

SS 2016

SS 2015


Master project:

  • Video Classification with Convolutional Neural Networks

SS 2014


  • Weiterführende Themen zu Internet- und WWW-Technologien

SS 2013

  • Seminar: Weiterführende Themen zu Internet- und WWW-Technologien (Bachelor seminar)

WS 2012/2013

  • Web-Programmierung und Web-Frameworks (Bachelor seminar)

SS 2012

  • Multimedia Analysis (Bachelor seminar)
  • Bachelorprojekt: "tele-TASK for Kids - Integration von tele-TASK in den Schulalltag"

WS 2011/12

  • Large Scale Processing for Multimedia Analysis (Master seminar, 4 SWS)

SS 2011

  • Multimedia Analysis (Bachelor seminar)

SS 2010

  • Multimedia Analysis (Bachelor seminar)
  • Seminar: Multimedia-Analyse (Bachelor)