The current data deluge demands fast and real-time processing of large datasets to support various applications, also for textual data, such as scientific publications. Natural language processing (NLP) is the field of automatically processing textual documents. Processing and semantically annotating large textual collection is a time-consuming and tiresome task, which requires integration of various tools. In-memory database (IMDB) technology comes as an alternative given its ability to process large document collections quickly in real time.
Olelo is our NLP platform and integrate various NLP-related tasks for the biomedical domain. It is build on top of the SAP HANA database. It currently indexes more than 15 millions abstracts from the PubMed database of biomedical publications as well as the MeSH ontology and various terminologies from UMLS. Olelo current NLP services includes document retrieval, automatic text summarization and question answering (cf. figure on the right).