Hasso-Plattner-Institut
Prof. Dr. h.c. mult. Hasso Plattner
 

In-Memory Data Management for Life Sciences

General Information

  • Prolonged application until April 22nd at 1 p.m. Have a look at the topic list stated below or at the slides from our kick-off meeting and send a wish list of three favored topics to Milena.Kraus(at)hpi.de

  • Teaching staff: Dr. Matthieu-P. Schapranow, Milena Kraus, Dr. Mariana Neves, Dr. Matthias Uflacker 
  • Location: Campus II
  • 4 Semesterwochenstunden (SWS) 6 ECTS (graded)
  • First course: Apr 12, 2016 at 9.15 - 10.45 a.m. in D-E.9/10
  • Next Regular Meeting: Tue, Apr. 19th at 9.15 - 10.45 a.m. in D-E.9/10 with an extended introduction into computational methods of processing of RNAseq data (for students interested in topics A-C). Despite of that students that were not able to attend our kick-off meeting are welcome to attend this lecture to ask questions on the provided topics. 

Scope of the seminar

This seminar is intended to improve your research skills and to broaden your horizon in the field of life sciences and in-memory database technology. We will introduce you to computational problems to process and analyze clinical and medical data and expect you to solve those problems with the use of in-memory database technology. 

You will be able to choose from a number of distinct sub-topics. We will coach you throughout the semester and will help to improve your research and presentation skills.

Our sub-topics can be grouped into two main foci:

1. Processing of large RNAseq data sets to elucidate causes of heart failure

2. Natural Language Processing of biomedical data sets

In our kick-off meeting we presented the respective sub-topics. For most of the sub-topics it will be possible to extend and discretize the work for 2-3 students. 

 

Grading

The final grading will be determined by the following individual parts, while each part must be passed individually (concrete percentages are yet to be assigned): 

  • Research article (40%), 
  • Seminar results and their presentations, e.g. mid-term and final (40%), and 
  • Individual commitment (20%).

Topic List

A: Processing of large RNAseq data sets to elucidate causes of heart failure

B: Statistical analysis of the transcriptome and differentially expressed genes 

C: Integration and harmonization of medical data 

D: Semantic role labeling to support question answering 

E: Natural language processing to support clinical decision 

F: Automatic translation of scientific publications 

G: Relation extraction based on distant supervision 

Y: IMDBfs - Adaption and evaluation of a shared high-performance file system built on in-memory technology 

Z: Distributed Execution: Adaption and evaluation of a distributed IMDB execution engine