Hasso-Plattner-Institut
Prof. Dr. h.c. mult. Hasso Plattner
 

In-Memory Computing for Life Sciences

General Information

Vision

In this seminar, you shall improve your skills to get familiar with a specific research topic on your own. We will give a brief introduction into the topic and then coach you throughout the semester while you work on a specific topic. Several presentations ensure that your presentation skills are improved, too. You can nominate for a list of topics, which will be presented during the first seminar meeting on Apr 8, 2014.

Grading

The final grading will be determined by the following individual parts, while each part must be passed at least:

  • Seminar results and research article (40%),
  • Presentations, e.g. mid-term and final (20%), and
  • Methodical research approach and individual commitment (40%)

Material

Selected Seminar Topics

Clinical Trial Recruitment Process

Patient data in clinical systems consists of a variety of structured data, such as therapy plans or blood screenings, and unstructured data, such as diagnosis or tumor reports, which evolves during a treatment or in the course of a clinical trial. It is crucial to identify patients with similar preconditions and similar therapy progress to assess treatment alternatives for complex diseases, such as cancer disease. In the course of the project, you will get access to pseudonymized patient data to extract relevant criteria for building a similarity metric for patient cases. You will apply your metric for analysis of patient cases, e.g. clustering, and to enable graphical exploration of patient data. For this project, you will be able to build on existing infrastructure of the high-performance in-memory genome project at our chair.

Search and Information Extraction in Unstructured Medical Documents

Medical documents are often available as unstructured text documents, e.g. publications, diagnosis reports, and clinical trials. Identification and assessment of relevant documents is a time-consuming task today, which binds medical experts. However, this process can be enhaced by relying on textual collection previously annotated with some pre-defined entity types and relationships. Your task will be identitfy and extract specific entities (e.g., drugs, diseases, medications, etc.) as well as relationhips that might occur between them, such as side effect reported for a particular drug, attributes related to a certain disease on a patient (e.g., severity, body location, etc.), or information related to administration of a medication (e.g., dosage, duration, etc). You will rely on previously selected medical documents and will use existing standard terminologies, such as MeSH or UMLS, to enable automatic pre-processing of the texts and to develop a ranking metric with the help of an in-memory database system. Thus, the end user receives a ranked list of results accordingly to their relevance. As a result, medical experts can start to explore more relevant documents with analysis tools and give their evaluation, which reduces the overall time for this task. For this project, you will be able to build on existing infrastructure of the high-performance in-memory genome project at our chair.

Identification of Mutations in Man and Mouse

Medical knowledge about DNA, genes, and their function in human-beings is just evolving in the last decade. However, researchers have obtained already a variety of additional knowledge from experiments with cell lines or animals that are not verified in humans, yet. Your task is to combine knowledge data sources about known variants for different species to enable a more holistic view on genetic variants. With your project coordinators you need to identify relevant data sources, combine them in an in-memory database, and build associations between individual data sources to link them. For this project, you will be able to build on existing infrastructure of the high-performance in-memory genome project at our chair.

Analysis of Medical Side Effects

The interaction of different medical substances at the same time may result in reduced activity of active ingredients or side effects in patients. Today, national database exist that document patient cases and observed side effects, e.g. maintained by the U.S. FDA or German Bfarm. Furthermore, leaflets of pharmaceutical products contain details about observed side effects in the course of their clinical trials. Your tasks is to combine and automatically integrate updates of these data sources within an in-memory database. As a result, researchers and physicians can instantly analyze and explore existing side effects reports to make a more specific decision when they can chose from a set of alternative pharmaceuticals to find the best strategy for a patient. For this project, you will be able to build on existing infrastructure of the high-performance in-memory genome project at our chair.

Real-time Analysis of Sensor Data for High-Risk Patients

Intensive-care patients require a complete supervision for specific parameters, such as blood pressure, oxygen saturation, and heart rate. However, it is known that the recovery process of patients might be faster when they can return to their used environment, e.g. at home with their families. Nowadays, the market of wearable sensors is emerging, which can be used to monitor high-risk patients from a telemedicine center without the need to stay permanently on the intensive-care station. Your task is to design and develop an in-memory database prototype to regularly acquire data from selected wearable sensor devices, maintain them to create a history of data, and to combine them to detect relevant pattern changes, e.g. support vector machines. As a result, significant changes in the reading of a sensor should be detected and medical experts should be informed automatically about this.

Evaluating Influence Factors for Identifying Genetic Variant

Severe illnesses such as cancer have their root causes in mutations on particular positions in the genetic code. To identify these for a specific patient, a DNA sample is analyzed and compared to a reference in Variant Calling. However, simply comparing the data to a single reference leads to inaccurate results because the data is error-prone and there are lots of genetic differences between populations that should be considered in calculations. Your task is to evaluate the influence of different aspects such as known variants or population data on the accuracy of the detected variants. You will discuss and identify relevant data sources/aspects with your supervisor, make it available within an in-memory database, and run variant detection directly inside the database. For this project, you build on an existing implementation for variant detection and refine its underlying statistical model.