Introduction Methods Experimental Results Inferring Cancer Subnetwork Markers using Density-Constrained Biclustering Phuong Dao ∗ , 1 , Recep Colak ∗ , 3 Raheleh Salari 1 , Flavia Moser 4 , Elai Davicioni 5 Alexander Schönhuth † , 2 , Martin Ester 1 , † 1 School of Computing Science, Simon Fraser University, Canada 2 Centrum Wiskunde & Informatica, Amsterdam, Netherlands 3 Department of Computing Science, University of Toronto, Canada 4 Center for Disease Control, University of British Columbia 5 GenomeDX Biosciences Inc. ∗ : Joint first authors, † : Joint corresponding, last authors
Introduction Methods Experimental Results Introduction Personalized Medicine • Determination of disease status based on patient genetics/genomics • Goal : Specific, individual choice of treatment • Necessary : Reliable disease markers
Introduction Methods Experimental Results Introduction Personalized Medicine • Determination of disease status based on patient genetics/genomics • Goal : Specific, individual choice of treatment • Necessary : Reliable disease markers • Monogenic: Each marker is a single gene • Multigenic: Each marker is a set of genes
Introduction Methods Experimental Results Single Gene Markers Control 1 Control 2 Control 3 Case 1 Case 2 Case 3 Control 1 Control 2 Control 3 Case 1 Case 2 Case 3 Gene 1 Gene 3 Gene 1 Differentially Expressed Gene 2 Gene 3 Gene 4 Gene 2 Gene 5 Gene 4 Gene 6 Gene 5 Gene 6 Non−Differentially Expressed Caveat : Single gene markers vary significantly across different studies
Introduction Methods Experimental Results Marker Selection Multigenic Traits Control 1 Control 2 Control 3 Case 1 Case 2 Case 3 Gene 1 Gene 2 Gene 3 Gene 4 G1 (0.95) (0.85) (0.75) Gene 1 G2 Gene 2 (0.8) G3 Gene 3 (0.9) Gene 4 G4 Gene Expression Profiles Interaction/Association Network Solution: Differentially expressed genes participating in the same pathway [Chuang et al., 2007], [Chowdhury et al. 2010]
Introduction Methods Experimental Results Our Approach Each of our subnetwork markers: • is a densely connected subnetwork ☞ Disease-related genes have more PPI interactions than expected [Goh et al., PNAS (2007)] • contains genes which are differentially expressed in a subset of samples ☞ cancer tumors vary greatly in phenotype, although belonging to the same (sub)type [Hampton et al., GR (2009)]
Introduction Methods Experimental Results Density-Constrained Biclusters P e ∈ E w e Definition : G is called α -dense if ≥ α ≥ 0 . 5. ( | V | 2 ) S1 S2 S3 G1 0.95 0.95 0.8 0.6 0.85 G1 1 1 0 0.9 0.75 G3 1 1 1 0.45 G2 0.85 G2 1 1 0 G3 0.8 0.25 0.75 0.9 0.9 0.7 0.9 1 1 1 G4 G4 0.55 0.5 0.95 0.8 0.85 0.95 0.75 0.95 0.65 0.35 G4 0.45 0.8 0.9 S1 S2 S3 0.750.8 0.9 0.7 0.3 0.8 0.9 0.7 0.9 G4 1 1 1 0.65 0.85 G5 G6 G5 0 1 1 0.9 0.8 0.95 0.75 0 1 1 G6 0.85 0 1 1 G7 0.95 G7 Our markers are α -densely connected subnetworks of genes that are differentially expressed in a subset of patients of size at least k (here: k = 2).
Introduction Methods Experimental Results Methods
Introduction Methods Experimental Results Density Constrained Biclustering Search Strategy Theorem: Every α -densely connected network of size n contains an α -densely connected subnetwork of size n − 1. A A A C D C 0.4 0.6 0.9 0.8 B C D B B D C A A A D 0.6 0.6 0.9 0.8 0.4 0.6 B A C 0.4 C 0.9 D 0.4 B 0.9 B D 0.8 B C 0.8 D Density: 0.45 = [(0.8 + 0.9 + 0.6 + 0.4) / 6] C Not Dense wDCB 0.4 0.6 B A 0.9 0.8 Not Connected maximal wDCB D Search Strategy: Breadth-first search.
Introduction Methods Experimental Results Classification 1. Marker computation: Feature space creation marker = dimension 2. Construct classifier using training data 3. Perform classification on test data Cross-platform study : Marker computation and test data from different platforms
Introduction Methods Experimental Results Experimental Results
Introduction Methods Experimental Results Network Data Confidence-scored PPI network [STRING, von Mering et al., NAR 2009] • Edges reflect physical protein-protein interactions • Confidence scores reflect the probability that the interaction is 0.95 0.8 0.6 associated with a cellular 0.9 0.45 0.85 0.25 0.75 0.9 0.9 0.7 phenomenon (and not an 0.55 0.5 0.95 0.8 0.85 0.95 0.75 experimental artifact) 0.65 0.95 0.35 0.45 0.8 0.9 0.750.8 0.9 0.7 0.3 0.8 • Scoring system based on KEGG 0.9 0.65 0.85 0.9 0.8 0.95 0.75 pathways
Introduction Methods Experimental Results Gene Expression Data Colon cancer • GSE8671, 32 patients / tissue pairs • GSE10950, 24 patients / tissue pairs • GSE6988, 123 samples across several cancer subtypes Breast cancer • GSE3494, 251 patients with different TP53 mutation status (wildtype vs. mutant)
Introduction Methods Experimental Results Colon Cancer Prediction GSE8671 >> GSE6988 1 0.95 0.9 0.85 AUC 0.8 0.75 0.7 SGM GMI 0.65 NETCOVER wDCB 0.6 0 5 10 15 20 25 30 35 40 45 50 #Subnetworks/Genes
Introduction Methods Experimental Results Colon Cancer Prognosis GSE8671 >> GSE6988 prognosis 1 0.9 0.8 0.7 AUC 0.6 0.5 SGM GMI NETCOVER 0.4 wDCB 0 10 20 30 40 50 # Subnetworks/Genes
Introduction Methods Experimental Results Colon Cancer: Prognosis Accuracy 8671 → 6988, Prognosis 10950 → 6988, Prognosis K SGM GMI NC wDCB SGM GMI NC wDCB 1 0.57 0.57 0.51 0.56 0.57 0.68 N/A 0.47 5 0.74 0.62 0.74 0.6 0.63 0.81 N/A 0.68 10 0.76 0.77 0.74 0.88 0.57 0.77 N/A 0.74 20 0.72 0.62 0.77 0.83 0.61 0.79 N/A 0.85 30 0.65 0.74 0.83 0.88 0.63 0.81 N/A 0.85 40 0.67 0.79 0.83 0.90 0.78 0.85 N/A 0.89 50 0.74 0.77 0.81 0.92 0.76 0.85 N/A 0.91 Top values previous methods Top value our method
Introduction Methods Experimental Results Breast Cancer TP53 Wildtype vs. Mutant GSE3494 (Miller et al.) 0.9 0.85 Accuracy 0.8 0.75 SGM (mappable) GMI (mappable) wDCB (mappable) SPM (not mappable) 0.7 0 5 10 15 20 25 # Subnetworks/Genes
Introduction Methods Experimental Results Subnetwork Marker Statistics # Subnetworks Enrichment # Subnetworks Enrichment GMI 806 0.38 755 0.34 NC 923 0.12 N/A N/A wDCB 282 0.76 216 0.74 8671 Subnetworks 10950 Subnetworks GMI = Greedy Mutual Information (Chuang et al.) NC = NetCover (Chowdhury et al.) wDCB = weighted Density Constrained Biclustering # Subnetworks = total number of subnetworks computed Enrichment = enrichment rate of the top-50 markers
Introduction Methods Experimental Results Top Markers in GSE8671 • Enriched with DNA replication initiation (p=6.39e-14), DNA metabolic process (p=6.15e-12) • TP53, BRCA1: tumor suppressor genes • Minichromosome maintenance (MCM) complex • MCM2, MCM5: early markers for colon cancer (Burger et al., 2008)
Introduction Methods Experimental Results Outlook / Acknowledgments Outlook : • Analyze subnetwork signatures • ncRNA-protein interaction data Acknowledgments : • Mehmet Koyutürk • David DesJardins, Google Inc. • Lab for Mathematical and Computational Biology, UC Berkeley
Introduction Methods Experimental Results Thanks for the attention!
Introduction Methods Experimental Results Densely Connected Subnetworks Properties Let G = ( V , E ) be a network with edge weights w e , e ∈ E . • The density θ ( G ) of G is � = 2 · � e ∈ E w e e ∈ E w e θ ( G ) := � | V | | V | ( | V | − 1 ) � 2 � | V | � where is the number of possible edges in G . 2 • G is called α -dense if θ ( G ) ≥ α ≥ 0 . 5 • An α -dense, connected network G is called α -densely connected.
Introduction Methods Experimental Results Classifier Construction G4 G1 0.95 0.9 0.7 0.85 0.75 G3 1. Rank density constrained G5 G2 G6 biclusters according to density 0.8 0.85 0.9 0.95 significance G4 G7 2. Keep only high-ranked Gene 1 1.25 subnetworks with little overlap Gene 2 1.5 Gene 3 3. Feature space dimension = 1.0 Marker 1 1.25 Gene 4 1.25 Average number of markers Marker 2 0.5 Gene 5 0.5 Gene 6 0.0 4. SVM classification Gene 7 0.25 Gene Expression Profile Average Gene Expression Profile
Introduction Methods Experimental Results Colon Cancer: Prediction Accuracy 8671 → 6988 10950 → 6988 K SGM GMI NC wDCB SGM GMI NC wDCB 1 0.56 0.84 0.72 0.84 0.63 0.37 N/A 0.77 5 0.73 0.72 0.72 0.82 0.82 0.68 N/A 0.86 10 0.76 0.76 0.83 0.85 0.82 0.81 N/A 0.88 20 0.80 0.84 0.86 0.89 0.84 0.83 N/A 0.89 30 0.80 0.83 0.84 0.91 0.83 0.85 N/A 0.85 40 0.85 0.85 0.87 0.90 0.84 0.84 N/A 0.89 50 0.85 0.84 0.85 0.93 0.81 0.82 N/A 0.89 Top values previous methods , our method
Recommend
More recommend