In the last decade digital libraries and web video portals have become more and more popular. The amount of video data available on the World Wide Web (WWW) is growing rapidly. According to the official statistic-report of the popular video portal YouTube more than 6 billion hours of video are watched each month and about 100 hours of video are uploaded every minute. Therefore, how to efficiently retrieve video data on the web or within large video archives has become a very important and challenging task.
In this seminar, various methods for automatic video analysis and retrieval will be developed based on state-of-the-art computer vision technologies. The system accuracy and performance will be evaluated by using opened benchmark.
In our current research, we focus on state-of-the-art techniques on video analysis and multimedia information retrieval (MIR). Potential topics include video Shot Boundary Detection (SBD), where a video stream will be separated into a set of representative key-frames. SBD often serves as a basis for further video analysis tasks. Video Text Detection (Video OCR) is one of the most intense research topics in MIR domain. Here we focus on improving existing approaches by using Deep-Learning techniques. Video Genre Classification is another topic attracted much more attention recently. An approach will be developed based on multimodal video information such as video key-frames, frame concepts, topics from video texts etc. The last topic is Real-time Video Object Tracking Applications. Various applications can be developed based on an existing object tracking approach, as e.g., interactive web navigation using object tracking algorithm.
Haojin Yang, Bernhard Quehl and Harald Sack, "A Framework for Improved Video Text Detection and Recognition", International Journal of MULTIMEDIA TOOLS AND APPLICATIONS (MTAP), special issue "Computer Vision for Multimedia", Volume 69 Number 1, pp 217-245. Publicher: Springer US, DOI: http://dx.doi.org/10.1007/s11042-012-1250-6, 2014
Epshtein, B.; Ofek, E.; Wexler, Y., "Detecting text in natural scenes with stroke width transform," Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on , vol., no., pp.2963,2970, 13-18 June 2010 doi: 10.1109/CVPR.2010.5540041
Tao Wang; Wu, D.J.; Coates, A; Ng, AY., "End-to-end text recognition with convolutional neural networks," Pattern Recognition (ICPR), 2012 21st International Conference on , vol., no., pp.3304,3308, 11-15 Nov. 2012
Andrej Karpathy* (Stanford), Sanketh Shetty (Google), George Toderici (Google), Rahul Sukthankar (Google), Thomas Leung (Google), Li Fei-Fei (Stanford University), “Large-scale Video Classification using Convolutional Neural Networks”, Int. Conference on Computer Vision and Pattern Recognition (CVPR ) 2014
Sidiropoulos, P.; Mezaris, V.; Kompatsiaris, I; Meinedo, H.; Bugalho, M.; Trancoso, I, "Temporal Video Segmentation to Scenes Using High-Level Audiovisual Features," Circuits and Systems for Video Technology, IEEE Transactions on , vol.21, no.8, pp.1163,1177, Aug. 2011 doi: 10.1109/TCSVT.2011.2138830
- Kalal, Z.; Mikolajczyk, K.; Matas, J., "Tracking-Learning-Detection," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.34, no.7, pp.1409-1422, July 2012 doi: 10.1109/TPAMI.2011.239