The current Sars-CoV2-Pandemic is an example for fast spreading infectious diseases in our modern, connected world. The tracking of infection chains helps to identify and isolate sources of infection, superspreaders and facilitates future prevention.
In addition to epidemiological contact tracing, molecular methods gain importance in order to reveal transmission paths that were not recognized or disclosed. The similarity of the continuously changing pathogen genomes suggests infection chains. While there is a variety of bioinformatical methods to analyze and visualize the molecular raw data (e.g. nextstrain.org), reliable computational models and techniques to infer a infection chain from the similarity of the pathogen genomes are missing.
The goal of the project is to develop and implement algorithms that infer possible infection chains. Such infection chains form trees in the corresponding weighted graph. One of the challenges is to infer those chains by starting from mere genome sequences. A comparison of the sequenced RNA yields a similarity/distance matrix between the genomes. A variety of approaches will be tested on different simulated and real-world data sets. We also aim at a rigorous analysis of their correctness and asymptotic run-time behaviour.
In correspondence with the research group for Data Analytics & Computational Statistics and the department for Methodology and Research Infrastructure at the Robert Koch Institute the algorithms can be evaluated on relevant data sets for Sars-CoV2 and other infectious diseases.
The participants will actively participate in the steps of the process of algorithm development for a real-world problem. This includes reviewing and understanding existing tools and how they can be used in the process. The goal is to develop and implement algorithms, rigorously prove their correctness and analyze relevant theoretical properties. A talk of 20-30 mins and a report of 8-12 pages about a previously agreed topic determine the grade in proportion 1:2.
The course will be managed via our Moodle, and enrolment in this Moodle course is required for course participation. We meet fridays 9:15-10:45am, starting on 6 November. Due to the current situation, we meet online and the link to our discussion room is announced in Moodle.