Hasso-Plattner-Institut
Prof. Dr. h.c. mult. Hasso Plattner
 

In-Memory Natural Language Processing

The current data deluge demands fast and real-time processing of large datasets to support various applications, also for textual data, such as scientific publications. Natural language processing (NLP) is the field of automatically processing textual documents. Processing and semantically annotating large textual collection is a time-consuming and tiresome task, which requires integration of various tools. In-memory database (IMDB) technology comes as an alternative given its ability to process large document collections quickly in real time.

Contact: Dr. Mariana Neves

Research Area: In-Memory Data Management for Life Sciences

We have a open position for students, please contact us!

Applications

Olelo is our NLP platform and integrate various NLP-related tasks for the biomedical domain.

Projects

NLP includes a variety of tasks such as tokenization (delimitation of words), part-of-speech tagging (assignment of syntactic categories to words), chunking (delimitation of phrases) and syntactic parsing (construction of syntactic tree for a sentence). Further, NLP also involves semantic-related tasks such as named-entity recognition (delimitation of predefined entity types, e.g., person and organization names), relation extraction (identification of pre-defined relations from text) and semantic role labeling (determining pre-defined semantic arguments). We have implemented many NLP methods in the SAP HANA database, as follow:

  • Chunking/shallow parsing and semantic role labeling (MP ss2015)
  • Named-entity recognition(BP 2015/2016)
  • Relation extraction (MP ws2015/2016)

There are many NLP applications that can be developed for various scenarios and domains, such as automatically generating summaries of one or more documents (summarization), retrieval of documents relevant for a particular query (information retrieval), extraction of specific information from a huge document collection (information extraction) and automatically answering questions posed by the users (question answering). We have developed NLP methods and applications for many of these task, as follow:

  • Deep learning to extract exact answers (Master thesis Georg Wiese)
  • Semantic role labeling to support question answering (Master thesis Fabian Eckert)
  • Olelo: intelligent navigation through the biomedical scientific literature (BP 2015/2016)
  • TextAI: intelligent annotation tool (MP ws2015/2016)
  • Generation of summaries for question answering system (MP ss2015 & Master thesis Frederik Schulze)
  • Generation of summaries for genes (Master thesis Frederik Schulze) (check our #GeneOfTheWeek summaries)

Challenges and Shared Tasks

We evaluated our methods on challenges and shared tasks organized by the scientific community:

Resources

We developed resources to support training and evaluation of NLP methods:

Other activities

We are involved in the organization of various activities:

Publications (since Oct/2013)

  • Wiese G, Weissenborn D and Neves M. Neural Question Answering at BioASQ 5B, Biomedical Natural Language Processing (BioNLP) Workshop at ACL'17, accepted, Vancouver, Canada.
  • Neves M, Eckert F, Folkerts H and Uflacker M. Assessing the performance of Olelo, a real-time biomedical question answering application, Biomedical Natural Language Processing (BioNLP) Workshop at ACL'17, accepted, Vancouver, Canada.
  • Kraus M, Niedermeier J, Jankrift M, Tietböhl S, Stachewicz T, Folkerts H, Uflacker M and Neves M. Olelo: a web application for intuitive exploration of biomedical literature, Nucleic Acids Research Web service issue.
  • Neves M. A parallel collection of clinical trials in Portuguese and English, 10th Workshop on Building and Using Comparable Corpora (BUCC) at ACL'17, accepted, Vancouver, Canada. (accepted)
  • Neves M, Folkerts H, Jankrift M, Niedermeier J, Stachewicz T, Tietböhl S, Kraus M and Uflacker M. Olelo: A Question Answering Application for Biomedicine, ACL'17 Demo, Vancouver, Canada. (accepted)
  • Habibi M, Weber L, Neves M, Wiegandt D L and Leser U. Deep Learning with Word Embeddings improves Biomedical Named Entity Recognition, ISMB/ECCB 2017, Prague, Czech Republic. (accepted)
  • Folkerts H and Neves M. Olelo’s named-entity recognition web service in the BeCalm TIPS task, BeCalm Workshop 2017, Barcelona, Spain.
  • Nentidis A, Yang Z, Neves M, Kim J-D, Krithara A, Paliouras G and Kakadiaris I. BioASQ and PubAnnotation: Using linked annotations in biomedical question answering, BLAH3 workshop, 2017, Tokyo, Japan.
  • Neves M and Kraus M. BioMedLAT Corpus: Annotation of the Lexical Answer Type for Biomedical Questions, Open Knowledge Base and Question Answering Workshop, Coling 2016, Osaka, Japan.
  • Schulze F and Neves M. Entity-Supported Summarization of Biomedical Abstracts, Proceedings of the Firth Workshop on Building and Evaluating Resources for Biomedical Text Mining, Coling 2016, Osaka, Japan.
  • Neves M, Rey M and Wittig U. Text Mining to Support Data Curation for SABIO-RK, BLAHmuc workshop, 2016, Munich, Germany.
  • Cohen K B, Demner-Fushman D, Fort K, Grouin C, Hunter L E, U. Leser U, Neveol A, Neves M and Zweigenbaum P. Towards the Last Annotation Tool, BLAHmuc workshop, 2016, Munich, Germany.
  • Bojar O, Chatterjee R, Federmann C, Graham Y, Haddow B, Huck M, Jimeno Yepes A, Koehn P, Logacheva V, Monz C, Negri M, Neveol A, Neves M, Popel M, Post M, Rubino R, Scarton C, Specia L, Turchi M, Verspoor K and Zampieri M. Findings of the 2016 Conference on Machine Translation, ACL 2016, Proceedings of the First Conference on Machine Translation (WMT16), pp. 131-198, 2016, Berlin, Germany.
  • Grundke M, Jasper J, Perchyk M, Sachse J P, Krestel R, Neves M. TextAI: Enhancing TextAE with Intelligent Annotation Support, 7th International Symposium on Semantc Mining for Biomedicine (SMBM), 2016, Potsdam, Germany.
  • Schulze F, Schüler R, Draeger T, Dummer D, Ernst A, Flemming P, Perscheid C, Neves M. Biomedical Question Answering Based on In-Memory Technology, ACL 2016, BioASQ Challenge, 2016, Berlin, Germany.
  • Neves M, Jimeno-Yepes A and Névéol A. The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine, International Conference on Language Resources and Evaluation (LREC), 2016, Portoroz, Slovenia.
  • Neves M. HPI Question Answering System in the BioASQ 2015 Challenge , Working Notes for the CLEF BioASQ Challenge, 2015,Toulouse, France.
  • Neves M and Leser U. Question Answering for Biology, Methods, 2015.
  • Mariana Neves: HPI in-memory-based database system in Task 2b of BioASQ Working Notes for the CLEF BioASQ Challenge, 2014
  • Konrad Herbst, Cindy Fähnrich, Mariana Neves, Matthieu-P. Schapranow: Applying In-Memory Technology for Automatic Template Filling in the Clinical Domain, CLEF 2014 Evaluation Labs and Workshop, Online Working Notes, 2014
  • Mariana Neves, Konrad Herbst, Matthias Uflacker, Hasso Plattner: Preliminary evaluation of passage retrieval in biomedical multilingual question answering, BioTxtM 2014, Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing, 2014
  • Mariana Neves: Preliminary evaluation of question answering to support biological curation, Poster in the BioCuration Conference (ISB2014), 2014, Toronto, Canada.