Hasso-Plattner-InstitutSDG am HPI
Hasso-Plattner-InstitutDSG am HPI
  • de

Hendrik Raetz, M.Sc

Doctoral student at Data Analytics and Computational Statistics Group

Supervisor: Prof. Dr. Bernhard Renard

Topic: Machine Learning in Clinical Proteomics and Metaproteomics


Phone:+49 331 5509 - 4944




In recent years, the discipline of proteomics mass spectrometry (MS) made it feasible to process proteomics data on a large scale so that it starts to rival genomics in analysis depth and scale. The analysis of protein data is desirable because proteins are much closer to the phenotype than genes and transcripts, as these are the molecules that carry out the function of an organism.However, their behavior cannot easily be inferred from the genome alone because there are many regulatory steps before, between, and after transcription and translation that alter the structure and function of the encoded proteins.The regulatory events that take place after translation are of special interest to my research. These are called post-translational modifications (PTMs). Recent studies showed that these PTMs contribute to diseases such as cancer and Alzheimer's disease. Thus, analyzing these PTMs can make a big difference in the correctness of disease prediction algorithms.

Proteomics data are usually obtained through MS experiments that result in large amounts of high-dimensional data. Using current methods, this sheer mass of data is often hard to analyze because it is usually either a time- or resource-intensive process. These and other factors lead to algorithms being unable to identify the underlying peptide sequence for the a majority (<70%) of acquired tandem MS spectra in a standard sample. Thus, it is important to develop improved processing methods that are not only faster but possess adequate sensitivity while keeping false discovery rates low. In recent years, methods from the field of machine learning have proven to be successful in the analysis of complex proteomics data, such as the detection and intensity estimation of peptide feature intensity. The logical next step is to research methods that can also be used in a classification setting, such as disease prediction.



I want to apply machine learning methods to efficiently analyze large amounts of MS data and research how they can be used to predict sample conditions without preprocessing, e.g., whether they belong to a healthy or disease-affected organism. For this, I will represent samples as image-like data structures so that they can be processed using fine-tuned deep-learning models from the computer vision domain. The pixels in these pseudo images represent the abundances of the detected peptides and thus help to understand the sequences of and changes to proteins involved with the disease. Also, the changes that are introduced to peptides through different PTMs will be represented in different pixels. These peptide and PTM information are the key to explaining the difference in health between samples.



Winter term 2022/2023: