Hasso-Plattner-Institut
 
    • de
 

Martin Malchow

Hasso-Plattner-Institut (HPI) für
Softwaresystemtechnik GmbH
Universität Potsdam
Prof.-Dr.-Helmert-Str. 2-3
D-14482 Potsdam
Germany 

office:

H-1.21

phone:

+49 (0)331-5509-461

fax:

+49 (0)331-5509-325

e-mail:

Martin.Malchow(at)hpi.de

Teaching - IT Systems Engineering

Publications

Enhance Lecture Archive Search with OCR Slide Detection and In-Memory Database Technology

Malchow, Martin; Bauer, Matthias; Meinel, Christoph in 2015 IEEE 18th International Conference on Computational Science and Engineering (CSE) Seite 176-183 . IEEE , 2015 .

On the Web there are a lot of frequently used video lecture archives which have grown up fast during the last couple of years. This fact led to a lot of lecture recordings which include knowledge for a variety of subjects. The typical way of searching these videos is by title and description. Unfortunately, not all important keywords and facts are mentioned in the title or description if they are available. Furthermore, there is no possibility to analyze how important those detected keywords are for the whole video. Another lecture archive specific virtue is that every regular university lecture is repeated yearly. Normally this will lead to duplicate lecture recordings. In search results doubling is disturbing for students when they want to watch the most recent lectures from the search result. This paper deals with the idea to resolve these problems by analyzing the recorded lecture slides with Optical Character Recognition (OCR). In addition to the name and description the OCR data will be used for a full text analysis to create an index for the lecture archive search. Furthermore, a fuzzy search is introduced. This will solve the issue of misspelled search requests and OCR detection defects. Additionally, this paper deals with the performance issues of a full text search with an in-memory database, issues in OCR detection, handling duplicate recordings of lectures repeated every year. Finally, an evaluation of the search performance in comparison with other database ideas besides the in-memory database is performed. Additionally, a user acceptability survey for the search results to increase the learning experience on lecture archives was performed. As a result, this paper shows how to handle the big amount of OCR data for a full text live search performed on an in-memory database in reasonable time. During this search a fuzzy search is performed additionally to resolve spelling mistakes and OCR detection problems. In conclusion this paper shows a solution for an enhanced video lecture archive search that supports students in online research processes and enhances their learning experience.
Weitere Informationen
Tags Distance_Learning E-Learning Fuzzy_Search In-Memory_Database OCR_Search Tele-Lecturing Teleteaching its