NLP includes a variety of tasks such as tokenization (delimitation of words), part-of-speech tagging (assignment of syntactic categories to words), chunking (delimitation of phrases) and syntactic parsing (construction of syntactic tree for a sentence). Further, NLP also involves semantic-related tasks such as named-entity recognition (delimitation of predefined entity types, e.g., person and organization names), relation extraction (identification of pre-defined relations from text) and semantic role labeling (determining pre-defined semantic arguments). We have implemented many NLP methods in the SAP HANA database, as follow:
- Chunking/shallow parsing and semantic role labeling (MP ss2015)
- Named-entity recognition(BP 2015/2016)
- Relation extraction (MP ws2015/2016)
There are many NLP applications that can be developed for various scenarios and domains, such as automatically generating summaries of one or more documents (summarization), retrieval of documents relevant for a particular query (information retrieval), extraction of specific information from a huge document collection (information extraction) and automatically answering questions posed by the users (question answering). We have developed NLP methods and applications for many of these task, as follow:
- Deep learning to extract exact answers (Master thesis Georg Wiese)
- Semantic role labeling to support question answering (Master thesis Fabian Eckert)
- Olelo: intelligent navigation through the biomedical scientific literature (BP 2015/2016)
- TextAI: intelligent annotation tool (MP ws2015/2016)
- Generation of summaries for question answering system (MP ss2015 & Master thesis Frederik Schulze)
- Generation of summaries for genes (Master thesis Frederik Schulze) (check our #GeneOfTheWeek summaries)