In-Memory Natural Language Processing

The current data deluge demands fast and real-time processing of large datasets to support various applications, also for textual data, such as scientific publications. Natural language processing (NLP) is the field of automatically processing textual documents. Processing and semantically annotating large textual collection is a time-consuming and tiresome task, which requires integration of various tools. In-memory database (IMDB) technology comes as an alternative given its ability to process large document collections quickly in real time.

Olelo is our NLP platform and integrate various NLP-related tasks for the biomedical domain.


NLP includes a variety of tasks such as tokenization (delimitation of words), part-of-speech tagging (assignment of syntactic categories to words), chunking (delimitation of phrases) and syntactic parsing (construction of syntactic tree for a sentence). Further, NLP also involves semantic-related tasks such as named-entity recognition (delimitation of predefined entity types, e.g., person and organization names), relation extraction (identification of pre-defined relations from text) and semantic role labeling (determining pre-defined semantic arguments). We have implemented many NLP methods in the SAP HANA database, as follow:

There are many NLP applications that can be developed for various scenarios and domains, such as automatically generating summaries of one or more documents (summarization), retrieval of documents relevant for a particular query (information retrieval), extraction of specific information from a huge document collection (information extraction) and automatically answering questions posed by the users (question answering). We have developed NLP methods and applications for many of these task, as follow:

Challenges and Shared Tasks

We evaluated our methods on challenges and shared tasks organized by the scientific community:


We developed resources to support training and evaluation of NLP methods:

Other activities

We are involved in the organization of various activities:

Publications (since Oct/2013)

