Prof. Dr. h.c. Hasso Plattner

Identifying Discriminant Cancer Genes

News: OKOA app is publicly available now! Try out yourself and explore public cancer data at http://okoa.epic-hpi.de/

Cancer is a Heterogeneous Disease

Cancer is the second leading cause of death worldwide. Yet, we still have little knowledge on it – one reason is that there is no single cancer disease, but the word “cancer” is rather used to refer to any of the 200 diseases that are characterized by an uncontrolled growth of cells, invading and damaging the body’s normal tissues. As no cancer is like the other, researchers strike to identify the main actors of the uncontrolled cell growth. Those main actors are typically genes that are abnormally (higher or lower) expressed in cancer cells and thus negatively affect the cell processes. Researchers are especially interested in changed behavior of gene expressions, as this tells them a lot about the molecular processes and relationships in cancerous cells, e.g gene functions and interactions.

RNA-Sequencing (RNA-Seq) delivers a complete snapshot of gene expression in a cell, with a single experiment containing expression levels of tens of thousands of genes from multiple hundred samples. The nature of gene expression data, however, poses challenges to its analysis in terms of its high dimensionality, noise, and complexity of the underlying biological processes to detect. Researchers aim to identify genes that reliably discriminate sample groups from each other, e.g. a cancerous from a healthy one. The current state-of-the-art for gene selection is to apply traditional statistical and machine learning approaches, e.g. Support Vector Machines (SVM).

Identifying Discriminant Cancer Genes

The aim of the project is to design and implement a technique that identifies markers for given clusters of cancer types. We use state-of-the-art and extended machine learning techniques to analyze genetic cancer data, e.g. gene expression profiles, that have been grouped prior into cancer types. For each cluster, i.e. cancer type, we aim to identify the cluster-discriminant features, e.g. those genes whose expression pattern is unique for the respective cluster. Moreover, we enable the user to straightforward specify algorithms and parameters in our web application. The results can be explored in an interactive diagram and be assessed with supplementary fitness scores. Additionally, the found genes are automatically evaluated regarding their cancer-relevance based on external biological knowledge bases and can be further examined with regards to their function in the cell and their role in causing cancer.

The project resulted in an interactive explorational web application named OKOA (hawaii.: different). We have imported public gene expression data from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) programm and made it ready to be analyzed in our app.  Try out yourself and visit http://okoa.epic-hpi.de/!

Contact: Cindy Perscheid, Milena Kraus, Matthias Uflacker