With increasing digitization and storage capacities, it becomes more and more viable to undergo massive digitization projects for analogue archives. Digitization allows easy access and long term preservation of old and sensitive physical material, where access is typically denied. Furthermore, digitization allows the material to be processed more efficiently. In this project, we aim to develop and apply novel automatic processing methods for the digitized archive of the WPI. Since Archival material, especially in the art history domain, contains many images and handwriting, we concentrated on analysing and extracting handwritten information. Challenges, which should be addressed in this project are scalability and quality of different approaches for handwriting recognition. The digitization project that the WPI is undergoing covers a document corpus of many million pages in different fonts, languages and physical condition.
Besides handwriting as one important type of semantic information in an archive, a digitized archive also contains many scans of documents that contain images. These images may be photographs, reproductions of works of art, or even sketches. A digitization pipeline would greatly benefit from additional analysis steps extracting metadata from such documents. In this line of work further analysis steps, such as classification of documentsby visual appearance, automatic creation of textual metadata (i.e. descriptions) of images, and recognition of depicted objects in images shall be added to the resulting digitization pipeline. All of the developed approaches shall be incorporated into a system usable by the researchers of the WPI by incorporation into their cataloguing software.