Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI
Login
  • de
 

Automatic Video Indexing and Retrieval Using Video OCR Technology

Haojin Yang

In the last decade digital libraries and web video portals have become more and more popular. The amount of video data available on the World Wide Web (WWW) is growing rapidly. According to the official statistic-report of the popular video portal YouTube more than 6 billion hours of video are watched each month and about 100 hours of video are uploaded every minute. Therefore, how to efficiently retrieve video data on the web or within large video archives has become a very important and challenging task.Text displayed in a video is an essential part of the high-level semantic information of the video content. Therefore, video text can be used as a valuable source for automated video indexing in video portals or digital video libraries.

In this thesis, we address both text detection and recognition issues for video images. In the text detection, we have developed a new localization-verification scheme, in which an edge-based multi-scale text detector first identifies potential text candidates with high recall rate. Then detected candidate text lines are refined by using an image entropy-based filter. Finally, Stroke Width Transform (SWT)- and Support Vector Machine (SVM)-based verification procedures are applied to eliminate false alarms. Next, for text recognition, we have developed a novel skeleton-based binarization method in order to separate text from complex backgrounds to make it processible for standard Optical Character Recognition (OCR) software.

The second part of the thesis (cf. Chapter 6) introduces several novel applications based on our proposed video analysis techniques. The first application is a semantic video search engine which applies the state-of-the-art video analysis techniques to search through the visual content of the video, and provides semantic entity-based search recommendations for the users. The proposed video OCR software is one of the most important parts for the automatic textual metadata generation in this system. The second application attempts to realize an efficient way of indexing lecture videos and exploring for them in a large lecture video archive. We have implemented an entire workflow for structural segmentation of lecture videos, video OCR analysis, automated lecture outline extraction from OCR transcripts, speech-to-text analysis, content-based keyword browsing and video search by using OCR and Automatic Speech Recognition (ASR) results.

Operability and accuracy of proposed methods have been evaluated using publicly available test data sets. Furthermore, a user study completes the evaluation.