Hasso-Plattner-Institut
Prof. Dr. Bernhard Renard
 

Cindy Perscheid, M. Sc.

Research Assistant & PhD Student

 Phone:+49 (0)331 5509 - 1315
 Email:cindy.perscheid(at)hpi.de
 Address:August-Bebel-Str. 88, 14482 Potsdam, Germany
 Room:HPI Campus II, Villa, V-1.19
 Profiles:Linkedin, Google Scholar, ResearchGate

Research

Integrative Biomarker Detection Using Prior Knowledge on Gene Expression Data Sets

Gene expression data is analyzed to identify biomarkers, e.g. relevant genes, which serve for diagnostic, predictive, or prognostic use. Traditional approaches for biomarker detection select distinctive features from the data based exclusively on the signals therein, facing multiple shortcomings in regards to overfitting, biomarker robustness, and actual biological relevance. Prior knowledge approaches are expected to address these issues by incorporating prior biological knowledge, e.g. on gene-disease associations, into the actual analysis. However, prior knowledge approaches are currently not widely applied in practice because they are often use-case specific and seldom applicable in a different scope. This leads to a lack of comparability of prior knowledge approaches, which in turn makes it currently impossible to assess their effectiveness in a broader context.
Our work addresses the aforementioned issues with three contributions. Our first contribution provides formal definitions for both prior knowledge and the flexible integration thereof into the feature selection process. Central to these concepts is the automatic retrieval of prior knowledge from online knowledge bases, which allows for streamlining the retrieval process and agreeing on a uniform definition for prior knowledge. We subsequently describe novel and generalized prior knowledge approaches that are flexible regarding the used prior knowledge and applicable to varying use case domains.
Our second contribution is the benchmarking platform Comprior. Comprior applies the aforementioned concepts in practice and allows for flexibly setting up comprehensive benchmarking studies for examining the performance of existing and novel prior knowledge approaches. It streamlines the retrieval of prior knowledge and allows for combining it with prior knowledge approaches. Comprior demonstrates the practical applicability of our concepts and further fosters the overall development and comparability of prior knowledge approaches.
Our third contribution is a comprehensive case study on the effectiveness of prior knowledge approaches. For that, we used Comprior and tested a broad range of both traditional and prior knowledge approaches in combination with multiple knowledge bases on data sets from multiple disease domains. Ultimately, our case study constitutes a thorough assessment of a) the suitability of selected knowledge bases for integration, b) the impact of prior knowledge being applied at different integration levels, and c) the improvements in terms of classification performance, biological relevance, and overall robustness. In summary, our contributions demonstrate that generalized concepts for prior knowledge and a streamlined retrieval process improve the applicability of prior knowledge approaches. Results from our case study show that the integration of prior knowledge positively affects biomarker results, particularly regarding their robustness. Our findings provide the first in-depth insights on the effectiveness of prior knowledge approaches and build a valuable foundation for future research.

 

Publications

  • 1.
    Perscheid, C.: Comprior: facilitating the implementation and automated benchmarking of prior knowledge-based feature selection approaches on gene expression data sets. BMC Bioinformatics. 22, (2021).
     
  • 2.
    Perscheid, C.: Integrative Biomarker Detection on High-Dimensional Gene Expression Data Sets: A Survey on Prior Knowledge Approaches. Briefings in Bioinformatics. (2020).
     
  • 3.
    Perscheid, C., Uflacker, M.: Integrating Biological Context into the Analysis of Gene Expression Data. In: Rodríguez, S., Prieto, J., Faria, P., Kłos, S., Fernández, A., Mazuelas, S., Jiménez-López, M.D., Moreno, M.N., and Navarro, E.M. (eds.) Distributed Computing and Artificial Intelligence, Special Sessions, 15th International Conference. pp. 339–343. Springer International Publishing, Cham (2019).
     
  • 4.
    Grasnick, B., Perscheid, C., Uflacker, M.: A Framework for the Automatic Combination and Evaluation of Gene Selection Methods. In: Fdez-Riverola, F., Mohamad, M.S., Rocha, M., Paz, J.F.D., and González, P. (eds.) 12th International Conference on Practical Applications of Computational Biology and Bioinformatics (PACBB) 2018. Springer (2019).
     
  • 5.
    Perscheid, C., Grasnick, B., Uflacker, M.: Integrative Gene Selection on Gene Expression Data: Providing Biological Context to Traditional Approaches. Journal of Integrative Bioinformatics. (2018).
     
  • 6.
    Perscheid, C., Faber, L., Kraus, M., Arndt, P., Janke, M., Rehfeldt, S., Schubotz, A., Slosarek, T., Uflacker, M.: A Tissue-aware Gene Selection Approach for Analyzing Multi-tissue Gene Expression Data. The 9th Workshop on Integrative Data Analysis in Systems Biology (IDASB), IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2018).
     
  • 7.
    Perscheid, C., Benzler, J., Hermann, C., Janke, M., Moyer, D., Laedtke, T., Adeoye, O., Denecke, K., Kirchner, G., Beerman, S., Schwarz, N., Tom-Aba, D., Krause, G.: Ebola Outbreak Containment: Real-Time Task and Resource Coordination With SORMAS. Frontiers in ICT. 5, 7 (2018).
     
  • 8.
    Schulze, F., Schüler, R., Draeger, T., Dummer, D., Ernst, A., Flemming, P., Perscheid, C., Neves, M.: HPI Question Answering System in BioASQ 2016. Proceedings of the Fourth BioASQ workshop at the Conference of the Association for Computational Linguistics. pp. 38–44 (2016).
     
  • 9.
    Fähnrich, C., Denecke, K., Adeoye, O., Benzler, J., Claus, H., Kirchner, G., Mall, S., Richter, R., Schapranow, M.-P., Schwarz, N.G., Tom-Aba, D., Uflacker, M., Poggensee, G., Krause, G.: Surveillance and Outbreak Response Management System (SORMAS) to support the control of the Ebola virus disease outbreak in West Africa. Euro Surveillance. (2015).
     
  • 10.
    Schapranow, M.-P., Perscheid, C., Wachsmann, A., Siegert, M., Bock, C., Horschig, F., Liedke, F., Brauer, J., Plattner, H.: A Federated In-Memory Database System For Life Sciences. Proceedings of the 9th International Workshop on Business Intelligence for the Real Time Enterprise (BIRTE) (2015).
     
  • 11.
    Schapranow, M.-P., Kraus, M., Perscheid, C., Bock, C., Liedtke, F., Plattner, H.: The Medical Knowledge Cockpit: Real-time Analysis of Big Medical Data Enabling Precision Medicine. Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM). pp. 770–775 (2015).
     
  • 12.
    Schapranow, M.-P., Perscheid, C., Plattner, H.: IT-Aided Business Process Enabling Real-time Analysis of Candidates for Clinical Trials. Proceedings of the 4th International Conference on Global Health Challenges. pp. 67–73. IARIA (2015).
     
  • 13.
    Fähnrich, C., Schapranow, M.-P., Plattner, H.: Facing the Genome Data Deluge: Efficiently Identifying Genetic Variants with In-Memory Database Technology. Proceedings of the ACM Symposium on Applied Computing (2015).
     
  • 14.
    Denecke, K., Mall, S., Fähnrich, C., Perscheid, C., Adeoye, O.O., Benzler, J., Claus, H., Kirchner, G., Richter, R., Schapranow, M.-P., Schwarz, N., Reigl, L., Tom-Aba, D., Gidado, S., Waziri, N.E., Uflacker, M., Krause, G., Poggensee, G.: „Surveillance and Outbreak Response Management and Analysis System (SORMAS)“ ermöglicht Kontrolle von Ebola-Infizierten in Westafrika. 10. Jahrestagung der Deutschen Gesellschaft für Epidemiologie (DGEpi) (2015).
     
  • 15.
    Herbst, K., Fähnrich, C., Neves, M., Schapranow, M.-P.: Applying In-Memory Technology for Automatic Template Filling in the Clinical Domain. CLEF 2014 Evaluation Labs and Workshop, Online Working Notes (2014).
     
  • 16.
    Fähnrich, C., Schapranow, M.-P., Plattner, H.: Towards Integrating the Detection of Genetic Variants into an In-Memory Database. Proceedings of the International Conference on Big Data (2014).
     
  • 17.
    Schapranow, M.-P., Klinghammer, K., Fähnrich, C., Plattner, H.: An Optimized Research Process for Real-time Drug Response Analysis. The Third International Conference on Global Health Challenges (2014).
     
  • 18.
    Schapranow, M.-P., Klinghammer, K., Fähnrich, C., Plattner, H.: In-Memory Technology Enables Interactive Drug Response Analysis. 16th International Conference on e-Health Networking, Applications and Services (Healthcom 2014) (2014).
     
  • 19.
    Schapranow, M.-P., Haeger, F., Fähnrich, C., Ziegler, E., Plattner, H.: In-Memory Computing Enabling Real-time Genome Data Analysis. International Journal on Advances in Life Sciences, Vol 6, Nr 1-2. (2014).
     
  • 20.
    Fähnrich, C., Lorey, J., Naumann, F., Forchhammer, B., Mascher, A., Retzlaff, P., Farahani, A.Z., Discher, S., Lemme, S., Papenbrock, T., Peschel, R.C., Stephan, R., Stening, T., Viehmeier, S.: Black Swan: Augmenting Statistics with Event Data. Proceedings of the 20th ACM international conference on Information and knowledge management (CIKM ’11). pp. 2517–2520 (2011).
     
  • 21.
    Schapranow, M.-P., Fähnrich, C., Zeier, A., Plattner, H.: Simulation of RFID-aided Supply Chains: Case Study of the Pharmaceutical Supply Chain. Third International Conference on Computational Intelligence, Modelling and Simulation, pp. 340-345 (2011).