Prof. Dr. h.c. Hasso Plattner

Olelo - Natural Language Processing Platform

The current data deluge demands fast and real-time processing of large datasets to support various applications, also for textual data, such as scientific publications. Natural language processing (NLP) is the field of automatically processing textual documents. Processing and semantically annotating large textual collection is a time-consuming and tiresome task, which requires integration of various tools. In-memory database (IMDB) technology comes as an alternative given its ability to process large document collections quickly in real time.

Olelo is our NLP platform and integrate various NLP-related tasks for the biomedical domain. It is build on top of the SAP HANA database. It currently indexes more than 15 millions abstracts from the PubMed database of biomedical publications as well as the MeSH ontology and various terminologies from UMLS. Olelo current NLP services includes document retrieval, automatic text summarization and question answering (cf. figure on the right).

Our NLP Projects

NLP includes a variety of tasks such as tokenization (delimitation of words), part-of-speech tagging (assignment of syntactic categories to words), chunking (delimitation of phrases) and syntactic parsing (construction of syntactic tree for a sentence). Further, NLP also involves semantic-related tasks such as named-entity recognition (delimitation of predefined entity types, e.g., person and organization names), relation extraction (identification of pre-defined relations from text) and semantic role labeling (determining pre-defined semantic arguments). We have implemented many NLP methods in the SAP HANA database, as follow:

  • Chunking/shallow parsing and semantic role labeling (MP ss2015)
  • Named-entity recognition(BP 2015/2016)
  • Relation extraction (MP ws2015/2016)

There are many NLP applications that can be developed for various scenarios and domains, such as automatically generating summaries of one or more documents (summarization), retrieval of documents relevant for a particular query (information retrieval), extraction of specific information from a huge document collection (information extraction) and automatically answering questions posed by the users (question answering). We have developed NLP applications for many of these task, as follow:

  • Medicate: intelligent navigation through the biomedical scientific literature (BP 2015/2016)
  • TextAI: intelligent annotation tool (MP ws2015/2016)
  • Generation of summaries for question answering system (MP ss2015 & Master thesis Schulze)
  • Generation of summaries for genes (Master thesis Schulze) (check our #GeneOfTheWeek summaries)

We also developed resources to support training and evaluation of NLP methods:

Challenges and Shared Tasks

We evaluated our methods on challenges and shared tasks organized by the scientific community:


We are involved in the organization of challenges and symposiums:

Publications (since Oct/2013)

  • Neves M. In-memory database for passage retrieval in biomedical question answering, Journal of Biomedical Semantics. [accepted]
  • Schulze F and Neves M. Entity-Supported Summarization of Biomedical Abstracts, Proceedings of the Firth Workshop on Building and Evaluating Resources for Biomedical Text Mining, Coling 2016, Osaka, Japan. [accepted]
  • Bojar O, Chatterjee R, Federmann C, Graham Y, Haddow B, Huck M, Jimeno Yepes A, Koehn P, Logacheva V, Monz C, Negri M, Neveol A, Neves M, Popel M, Post M, Rubino R, Scarton C, Specia L, Turchi M, Verspoor K and Zampieri M. Findings of the 2016 Conference on Machine Translation, ACL 2016, Proceedings of the First Conference on Machine Translation (WMT16), pp. 131-198, 2016, Berlin, Germany.
  • Grundke M, Jasper J, Perchyk M, Sachse J P, Krestel R, Neves M. TextAI: Enhancing TextAE with Intelligent Annotation Support, 7th International Symposium on Semantc Mining for Biomedicine (SMBM), 2016, Potsdam, Germany.
  • Schulze F, Schüler R, Draeger T, Dummer D, Ernst A, Flemming P, Perscheid C, Neves M. Biomedical Question Answering Based on In-Memory Technology, ACL 2016, BioASQ Challenge, 2016, Berlin, Germany.
  • Neves M, Jimeno-Yepes A and Névéol A. The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine, International Conference on Language Resources and Evaluation (LREC), 2016, Portoroz, Slovenia.
  • Neves M. HPI Question Answering System in the BioASQ 2015 Challenge , Working Notes for the CLEF BioASQ Challenge, 2015,Toulouse, France.
  • Neves M and Leser U. Question Answering for Biology, Methods, 2015.
  • Mariana Neves: HPI in-memory-based database system in Task 2b of BioASQ Working Notes for the CLEF BioASQ Challenge, 2014
  • Konrad Herbst, Cindy Fähnrich, Mariana Neves, Matthieu-P. Schapranow: Applying In-Memory Technology for Automatic Template Filling in the Clinical Domain, CLEF 2014 Evaluation Labs and Workshop, Online Working Notes, 2014
  • Mariana Neves, Konrad Herbst, Matthias Uflacker, Hasso Plattner: Preliminary evaluation of passage retrieval in biomedical multilingual question answering, BioTxtM 2014, Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing, 2014
  • Mariana Neves: Preliminary evaluation of question answering to support biological curation, Poster in the BioCuration Conference (ISB2014), 2014, Toronto, Canada.