Prof. Dr. h.c. Hasso Plattner

Cindy Perscheid, M. Sc.

Research Assistant, PhD Candidate

 Phone:+49 (0)331 5509 - 1315
 Fax:+49 (0)331 5509 - 579
 Organization:Hasso Plattner Institute, University of Potsdam
 Address:August-Bebel-Str. 88, 14482 Potsdam, Germany
 Room:HPI Campus II, Villa, V-1.19


Providing Biological Context to Biomarker Detection from Gene Expression Data Sets

Gene expression data sets provide a snapshot of a cell’s gene activity by measuring the expression levels of each single gene. This kind of data is often analyzed to identify biomarkers, e.g. relevant genes, which are of diagnostic, predictive, or prognostic use. However, identifying robust biomarkers from RNAseq data remains an open challenge: A majority of the approaches for biomarker detection focus exclusively on the statistical significance of a gene in the data set, although statistical significance does not necessarily imply biological relevance. As a consequence, a multitude of distinct alleged biomarker gene sets have been published that show low diagnostic or predictive performance when applied on new data sets. Integrative approaches address this issue by providing biological context either from established biological knowledge or multiple *omics data sets retrieved from the same sample.

Our research covers both aspects of integrative analyses and examines how biomarker detection improves when additional biological context is provided during the analysis. First, we integrate prior biological knowledge, e.g. known gene-disease associations or biological network information, at multiple levels of the analysis to derive a set of genes that can serve as biomarkers. Second, we combine genomic, i.e. genetic variants, with transcriptomic, i.e. gene expression, data. Instead of relevant genes, we aim to derive signaling pathways as biomarkers. We evaluate our developed approaches with regards to robustness and biological relevance on real-world data sets from multiple cancer types.

Keywords: Gene Expression, Genetic Variants, Biological Networks, Prior Knowledge, Integrative Analysis, Feature Selection, Machine Learning, Okoa, SORMAS



  • 1.
    Perscheid, C.: Comprior: facilitating the implementation and automated benchmarking of prior knowledge-based feature selection approaches on gene expression data sets. BMC Bioinformatics. 22, (2021).
  • 2.
    Perscheid, C.: Integrative Biomarker Detection on High-Dimensional Gene Expression Data Sets: A Survey on Prior Knowledge Approaches. Briefings in Bioinformatics. (2020).
  • 3.
    Perscheid, C., Uflacker, M.: Integrating Biological Context into the Analysis of Gene Expression Data. In: Rodríguez, S., Prieto, J., Faria, P., Kłos, S., Fernández, A., Mazuelas, S., Jiménez-López, M.D., Moreno, M.N., and Navarro, E.M. (eds.) Distributed Computing and Artificial Intelligence, Special Sessions, 15th International Conference. pp. 339–343. Springer International Publishing, Cham (2019).
  • 4.
    Grasnick, B., Perscheid, C., Uflacker, M.: A Framework for the Automatic Combination and Evaluation of Gene Selection Methods. In: Fdez-Riverola, F., Mohamad, M.S., Rocha, M., Paz, J.F.D., and González, P. (eds.) 12th International Conference on Practical Applications of Computational Biology and Bioinformatics (PACBB) 2018. Springer (2019).
  • 5.
    Perscheid, C., Grasnick, B., Uflacker, M.: Integrative Gene Selection on Gene Expression Data: Providing Biological Context to Traditional Approaches. Journal of Integrative Bioinformatics. (2018).
  • 6.
    Perscheid, C., Faber, L., Kraus, M., Arndt, P., Janke, M., Rehfeldt, S., Schubotz, A., Slosarek, T., Uflacker, M.: A Tissue-aware Gene Selection Approach for Analyzing Multi-tissue Gene Expression Data. The 9th Workshop on Integrative Data Analysis in Systems Biology (IDASB), IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2018).
  • 7.
    Perscheid, C., Benzler, J., Hermann, C., Janke, M., Moyer, D., Laedtke, T., Adeoye, O., Denecke, K., Kirchner, G., Beerman, S., Schwarz, N., Tom-Aba, D., Krause, G.: Ebola Outbreak Containment: Real-Time Task and Resource Coordination With SORMAS. Frontiers in ICT. 5, 7 (2018).
  • 8.
    Schulze, F., Schüler, R., Draeger, T., Dummer, D., Ernst, A., Flemming, P., Perscheid, C., Neves, M.: HPI Question Answering System in BioASQ 2016. Proceedings of the Fourth BioASQ workshop at the Conference of the Association for Computational Linguistics. pp. 38–44 (2016).
  • 9.
    Fähnrich, C., Denecke, K., Adeoye, O., Benzler, J., Claus, H., Kirchner, G., Mall, S., Richter, R., Schapranow, M.-P., Schwarz, N.G., Tom-Aba, D., Uflacker, M., Poggensee, G., Krause, G.: Surveillance and Outbreak Response Management System (SORMAS) to support the control of the Ebola virus disease outbreak in West Africa. Euro Surveillance. (2015).
  • 10.
    Schapranow, M.-P., Perscheid, C., Wachsmann, A., Siegert, M., Bock, C., Horschig, F., Liedke, F., Brauer, J., Plattner, H.: A Federated In-Memory Database System For Life Sciences. Proceedings of the 9th International Workshop on Business Intelligence for the Real Time Enterprise (BIRTE) (2015).
  • 11.
    Schapranow, M.-P., Kraus, M., Perscheid, C., Bock, C., Liedtke, F., Plattner, H.: The Medical Knowledge Cockpit: Real-time Analysis of Big Medical Data Enabling Precision Medicine. Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM). pp. 770–775 (2015).
  • 12.
    Schapranow, M.-P., Perscheid, C., Plattner, H.: IT-Aided Business Process Enabling Real-time Analysis of Candidates for Clinical Trials. Proceedings of the 4th International Conference on Global Health Challenges. pp. 67–73. IARIA (2015).
  • 13.
    Fähnrich, C., Schapranow, M.-P., Plattner, H.: Facing the Genome Data Deluge: Efficiently Identifying Genetic Variants with In-Memory Database Technology. Proceedings of the ACM Symposium on Applied Computing (2015).
  • 14.
    Denecke, K., Mall, S., Fähnrich, C., Perscheid, C., Adeoye, O.O., Benzler, J., Claus, H., Kirchner, G., Richter, R., Schapranow, M.-P., Schwarz, N., Reigl, L., Tom-Aba, D., Gidado, S., Waziri, N.E., Uflacker, M., Krause, G., Poggensee, G.: „Surveillance and Outbreak Response Management and Analysis System (SORMAS)“ ermöglicht Kontrolle von Ebola-Infizierten in Westafrika. 10. Jahrestagung der Deutschen Gesellschaft für Epidemiologie (DGEpi) (2015).
  • 15.
    Herbst, K., Fähnrich, C., Neves, M., Schapranow, M.-P.: Applying In-Memory Technology for Automatic Template Filling in the Clinical Domain. CLEF 2014 Evaluation Labs and Workshop, Online Working Notes (2014).
  • 16.
    Fähnrich, C., Schapranow, M.-P., Plattner, H.: Towards Integrating the Detection of Genetic Variants into an In-Memory Database. Proceedings of the International Conference on Big Data (2014).
  • 17.
    Schapranow, M.-P., Klinghammer, K., Fähnrich, C., Plattner, H.: An Optimized Research Process for Real-time Drug Response Analysis. The Third International Conference on Global Health Challenges (2014).
  • 18.
    Schapranow, M.-P., Klinghammer, K., Fähnrich, C., Plattner, H.: In-Memory Technology Enables Interactive Drug Response Analysis. 16th International Conference on e-Health Networking, Applications and Services (Healthcom 2014) (2014).
  • 19.
    Schapranow, M.-P., Haeger, F., Fähnrich, C., Ziegler, E., Plattner, H.: In-Memory Computing Enabling Real-time Genome Data Analysis. International Journal on Advances in Life Sciences, Vol 6, Nr 1-2. (2014).
  • 20.
    Fähnrich, C., Lorey, J., Naumann, F., Forchhammer, B., Mascher, A., Retzlaff, P., Farahani, A.Z., Discher, S., Lemme, S., Papenbrock, T., Peschel, R.C., Stephan, R., Stening, T., Viehmeier, S.: Black Swan: Augmenting Statistics with Event Data. Proceedings of the 20th ACM international conference on Information and knowledge management (CIKM ’11). pp. 2517–2520 (2011).
  • 21.
    Schapranow, M.-P., Fähnrich, C., Zeier, A., Plattner, H.: Simulation of RFID-aided Supply Chains: Case Study of the Pharmaceutical Supply Chain. Third International Conference on Computational Intelligence, Modelling and Simulation, pp. 340-345 (2011).