Analysis of Gene Expression Profiles Analysis of Gene Expression Profiles and Drug Activity Patterns and Drug Activity Patterns for the Molecular Pharmacology of Cancer for the Molecular Pharmacology of Cancer Jeong-Ho Chang, Kyu-Baek Hwang, and Byoung-Tak Zhang School of Computer Science and Engineering Seoul National University 151-742 Seoul, Korea http://bi.snu.ac.kr
Outline Outline ! Introduction ! Analyzing Cell-Cell Relations through Clustering ♦ Experimental Results ! Analyzing Gene-Drug Relations Using Bayesian Networks ♦ Experimental Results ! Concluding Remarks 2
Mining on Mining on Gene Expression and Drug Activity Data Gene Expression and Drug Activity Data ! Relationships among human cancer, gene expression, and drug activity Human cancer Human cancer Gene expression Drug activity Gene expression Drug activity ! Revealing these relationships " ♦ Cause and mechanisms of the cancer development ♦ New molecular targets for anti-cancer drugs 3
NCI60 Cell Lines Data Set NCI60 Cell Lines Data Set ! From 60 human cancer cell lines [Scherf 00] ♦ Colorectal, renal, ovarian, breast, prostate, lung, and central nervous system origin cancers, as well as leukemias and melanomas ! Gene expression patterns ♦ cDNA microarray ! Individual targets ♦ Analysis of molecular characteristics other than mRNA expressions ! Drug activity patterns ♦ Sulphorhodamine B assay " changes in total cellular protein after 48 hours of drug treatment 4
Analytical Effort Analytical Effort ! Analysis of cell-cell relationships using cluster analysis ♦ Clustering of cell lines based on ! Gene expression patterns only. ! Drug activity patterns only. ! Both patterns combined with weighted similarity. ! Analysis of gene-drug correlations using Bayesian networks ♦ Analysis of gene expression-drug activity dependencies ! Each cell line is represented by its gene expression profiles and drug activity patterns. ! Bayesian networks are constructed and analyzed for the discovery of dependencies between gene expressions and drug activities. 5
Analyzing Cell- -Cell Relations through Cell Relations through Analyzing Cell Clustering Clustering
Clustering Methods Clustering Methods ! Soft Topographic Vector Quantization [Graepel 98] ♦ Based on statistical physics ♦ Soft clustering + Topographic Phase transition mapping ♦ Clustering as an optimization Deterministic annealing ♦ Learned by deterministic annealing ( ) Phase transition ∑ − β exp ( , ) h e x c ∈ = jk ik i k ( ) ( ) k P x C ∑ ∑ − β i j exp ( , ) h e x c jk ik i k j k h : neighborhood function jk between cluster j and k 7
Clustering of Cell Lines Clustering of Cell Lines based on Gene Expression Profiles based on Gene Expression Profiles ! Among ten runs, result with the best cost value is shown here. ! Neighbor clusters show similar patterns as in the SOM. ! F ormed clusters tend to reflect the tissue of origin. ♦ CNS, RE, ME, LE, and CO 8
Using Drug Activity Information Using Drug Activity Information in the Analysis of Cell- -Cell Relations (1/3) Cell Relations (1/3) in the Analysis of Cell ! Questions ♦ Are drug activity patterns in cell lines also related with the tissue of origin? ♦ Is this relationship similar to that of gene expression profiles? g e ! Cluster analysis based on jk = − α + α g d ( 1 ) e e e + jk jk jk gene-drug information d e jk ! A linear interpolation of Cluster k distances based on gene expression and drug activity. ! If both patterns depend on the tissue of origin, the cluster structure will not differ strongly. Gene expressions Drug activities 9
Using Drug Activity Information Using Drug Activity Information in the Analysis of Cell- -Cell Relations (2/3) Cell Relations (2/3) in the Analysis of Cell ! Quantitative comparison between the clustering analyses ♦ Entropy n E m ∑ ∑ = = − j j ≤ E ≤ log E E p p ( 0 log ) n j ij ij i n j j = 1 j : the ratio of members in cluster j which belong to class i p ! ij : the number of members in cluster j ! n j ! If the number of clusters is fixed, – The higher value of entropy " lower reflection of the original class structure. ♦ Averaged Pearson correlation 2 ∑ < n R m ∑ = = ( , ) R r x x j j R − j i k ( 1 ) i k n n n = j j 1 j 10
Using Drug Activity Information Using Drug Activity Information in the Analysis of Cell- -Cell Relations (3/3) Cell Relations (3/3) in the Analysis of Cell Clustering Entropy 0.35 1.4 0.3 1.2 0.25 1 Av erag e c o rrelatio n 0.2 0.8 Entropy 0.6 0.15 15Clusters_Gene 0.4 0.1 15Clusters_Drug 11Clusters_Gene 0.2 0.05 11Clusters_Drug 0 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Value of alpha Value of alpha α Average Pearson correlation Entropy with varying α with varying 11
Clustering of Cell Lines Clustering of Cell Lines based on Drug Activity Patterns based on Drug Activity Patterns ! Among ten runs, result with the best cost value is shown here. ! The clusters does not reflect the tissue of origin, compared to the result based on gene expression profiles. 12
Analyzing Gene- -Drug Relations Drug Relations Analyzing Gene Using Bayesian Networks Using Bayesian Networks
Bayesian Networks Bayesian Networks ! The joint probability distribution over all the variables in the Bayesian network. [Heckerman 96] ∏ = = n ( , ,..., ) ( | ) P X X X P X Pa Local probability 1 2 n i i 1 i distribution for X i Pa : the set of parents of X i i Θ = θ θ ( ,..., ) ~ parameter for ( | ) P X Pa 1 i i iq i i i θ = θ α α ( ) Dir ( | ,..., ) P 1 ij ij ij ijr A B i : # of configurat ions for q Pa i i : # of states for r X i i C D ( , , , , ) P A B C D E = ( ) ( | ) ( | , ) ( | , , ) ( | , , , ) P A P B A P C A B P D A B C P E A B C D E = ( ) ( ) ( | , ) ( | ) ( | ) P A P B P C A B P D B P E C 14
Bayesian Network Learning Bayesian Network Learning ! Learning for the local probability distribution θ = θ α α ( ) Dir ( | ,..., ) P ij ij ij 1 ijr i θ = θ α + α + ( | ) Dir ( | ,..., ) P D N N 1 1 ij ij ij ij ijr ijr i i ! Learning for the network structure [Friedman and Goldszmidt 99] ♦ Search for the best-scoring network structure (greedy search) ♦ BD (Bayesian Dirichlet) score [Heckerman et al. 95] = ⋅ ( , ) ( ) ( | ) p D S p S p D S Γ α Γ α + ( ) ( ) N ∏ ∏ ∏ = ⋅ n q r ij ijk ijk i i ( ) p S Γ α + Γ α = = = i 1 j 1 ( ) k 1 ( ) N ij ij ijk : training data D : network structure Sufficient S Prior ∑ α ∑ statistics α = = , N N ij ijk ij ijk k k calculated from D Γ = Γ + = Γ 15 ( 1 ) 1 , ( 1 ) ( ) x x x
Schematic View Schematic View of the Modeling Approach of the Modeling Approach Preprocessing Gene B - Thresholding Gene A - Clustering Gene Expression Gene Expression Drug A - Discretization Data Data Drug B Cancer Drug activity Drug activity - Selected genes, drugs Data Data and cancer type node Gene A Gene B Drug A Bayesian network Drug B learning Cancer < Learned Bayesian network > - Dependency analysis - Probabilistic inference 16
Data Preparation Data Preparation ! cDNA microarray data 60 samples ♦ Gene expression profiles on Gene expressions 60 cell lines ♦ 1376 × 60 matrix 1376 genes ! Drug activity data ♦ Drug activity patterns on 60 cell lines 60 samples ♦ 118 × 60 matrix Drug activities 118 drugs (1376 + 118) × 60 data matrix 17
Preprocessing Preprocessing 60 samples ! Thresholding 60 samples ♦ Elimination of 1376 unknown ESTs " 805 genes genes 805 genes ♦ Elimination of drugs 84 118 which have more drugs drugs than 4 missing values " 84 drugs ! Discretization 0 ♦ Local probability model for Bayesian networks: -1 1 multinomial distribution 18 µ - c ⋅ σ µ µ + c ⋅ σ
Bayesian Network Learning Bayesian Network Learning for Gene- -Drug Analysis Drug Analysis for Gene ! Large-scale Bayesian network ♦ Several hundreds nodes (up to 890) ♦ General greedy search is inapplicable because of time and space complexity. ! Search heuristics ♦ Local to global search heuristics ♦ Exploit the locality of Bayesian networks to reduce the entire search space. ! The local structure: Markov blanket [Pearl 88] ! Find the candidate Markov blanket (of pre-determined size k ) of each node " reduce the global search space 19
Recommend
More recommend