Biology-Driven Clustering of Microarray Data Applications to the NCI60 Data Set K.R. Coombes, K.A. Baggerly, D.N. Stivers, J. Wang, D. Gold, H.G. Sung, and S.J. Lee
Introduction • Microarray data is more than a large, unstructured matrix. – We already know many genes important for studying cancer through their involvement in specific biological processes – We also know that reproducible chromosomal abnormalities play an important role in cancer • Need analytical methods that use biological information early
Methods • First, updated the annotations of the genes on the microarray • Performed separate analyses – using genes on individual chromosomes – using genes involved in different biological processes • Developed ways to assess how well each set of genes classified samples
Quality of Annotations • Problem: – I.M.A.G.E. clone IDs and GenBank accession numbers are archival – UniGene clusters, gene names, descriptions, functions, etc., are changeable • Solution: – Download latest UniGene (build 137) and LocusLink to update annotations
How many genes on the array have good annotations? Number Current UniGene of Spots Status 294 None (control spots) 128 Only 3’ – unknown to UniGene 1379 Only 3’ – known to UniGene 1 Only 5’ – unknown Only trust the 6 Only 5’ – known 7478 spots where 399 Both – unknown the UniGene 763 Both – 3’ known, 5’ unknown 291 Both – 3’ unknown, 5’ known clusters match. 646 Both known, but disagree 6093 Both known, and agree
Where are the genes located? 6 (Observed - Expected) / SD 4 2 0 -2 -4 chi^2 = 148.8 p < 10^(-10) -6 X Y 5 10 15 20 Chromosome
How do we determine the functions of genes? • UniGene -> LocusLink -> GeneOntology • GeneOntology is a structured, hierarchical vocabulary to describe gene functions in three broad areas: – biological process (why) – molecular function (what) – cellular component (where)
What kinds of genes are on the microarray? Function Ann. Spots Function Ann. Spots Oncogenesis 140 180 Cell shape and size 78 101 Apoptosis 128 138 Protein traffic 157 188 Physiological proc. 180 210 Transport 146 136 Perc. of ext. stimuli 238 150 Cell proliferation 197 249 Ectoderm devel. 129 152 Stress response 599 372 Mesoderm devel. 92 102 Radiation response 147 136 Cell adhesion 111 140 Cell cycle 494 283 Cell-cell signaling 137 166 Nucleic acid met. 695 595 Signal transduction 222 228 Protein metabolism 471 567 Intracell sig cascade 110 110 Lipid metabolism 146 156 Cell motility 120 153 Carbohydrate met. 103 97 Cell organization 98 118 Energy pathways 88 98
Data Preprocessing • Remove spots with poor annotations and spots with median intensity below the 97th percentile of empty spots. • Normalize each array so median log ratio between channels is one • Center each gene so mean log ratio across experiments is zero • Use (1-correlation)/2 as distance metric
How well does a set of genes distinguish types of cancer? • Three methods for assessment: – Qualitative (PCA, MDS) – Quantitative (PCA + ANOVA) – Semi-quantitative (Grading Dendrograms)
Multidimensional Scaling B 0.3 B M M 0.2 M M M R coordinate 2 S M B 0.1 B M L S B N N L N N S S C L S N 0.0 S N C L N B C L L C M B C B -0.1 N C R C R R O P O O R N R P R O OO R -0.2 -0.1 0.0 0.1 0.2 coordinate 1
PCANOVA
How good is a dendrogram? • A = cluster contains all colon.hcc2998 colon.km12 and only one kind of colon.colo205 colon.hct15 colon.sw620 colon.ht29 colon.hct116 cancer breast.t47d breast.mcf7 leukemia.hl60 leukemia.k562 leukemia.ccrfcem leukemia.molt4 • B = all, with extras leukemia.rpmi8226 leukemia.srcl7019 melanoma.uacc62 melanoma.skmel28 melanoma.skmel5 melanoma.uacc577 • C = all except one melanoma.malme3m melanoma.skmel2 breast.mdan breast.mdamb435 melanoma.m14 breast.unknown ovarian.8 • D = all except one, with nsclc.hop62 nsclc.h226 breast.bt549 cns.u251 cns.snb19 extras cns.snb75 cns.sf295 cns.sf268 breast.hs578t cns.sf539 renal.sn12c • E = all except two breast.mdamb231 nsclc.hop92 nsclc.a549 nsclc.ekvx nsclc.h322 nsclc.h522 • F = all except two, with nsclc.h23 nsclc.h460 prostate.du145 ovarian.5 ovarian.skov3 ovarian.igrov1 extras ovarian.3 ovarian.4 renal.tk10 renal.achn renal.uo31 renal.786o renal.rxf393 Cancer B C L M N O P R S renal.caki1 renal.a498 prostate.pc3 melanoma.loximvi 0.6 0.4 0.2 0.0 Score A A D F D C B
Can cancers be distinguished by genes on one chromosome? ch B C L M N O P R S ch B C L M N O P R S 1 B A D F D B 13 D E 2 E C D D E D E 14 A A F 3 C E D E F 15 C B C F C 4 E E E E 16 5 A A D F E 17 A A D F E E 6 C A D E E D 18 E D 7 E A D E C E 19 D D 8 E C D 20 E C 9 B C C E E E 21 10 D E 22 A E E 11 E C C D X B A D E D 12 B C C E E E
Heterogeneity of different types of cancer • Some cancers (colon, leukemia) are fairly easy to distinguish from others • Some (breast, lung) are so heterogeneous as to be almost impossible to distinguish • Some chromosomes (1, 2, 6, 7, 9, 12, 17) can distinguish many cancers. • Some (16, 21) are essentially random
Chromosome 2 colon.sw620 0.8 colon.hct15 colon.hcc2998 colon.km12 colon.colo205 0.6 nsclc.a549 nsclc.h322 colon.hct116 nsclc.ekvx breast.t47d 0.4 breast.mcf7 nsclc.h522 nsclc.h23 leukemia.hl60 leukemia.k562 0.2 leukemia.ccrfcem leukemia.molt4 leukemia.rpmi8226 leukemia.srcl7019 0.0 prostate.pc3 ovarian.4 ovarian.3 ovarian.skov3 ovarian.igrov1 ovarian.5 breast.unknown ovarian.8 nsclc.hop62 prostate.du145 renal.786o renal.tk10 renal.a498 renal.rxf393 renal.achn renal.uo31 renal.caki1 nsclc.h226 melanoma.uacc62 melanoma.skmel5 melanoma.malme3m melanoma.uacc577 melanoma.skmel2 melanoma.skmel28 melanoma.m14 breast.mdan breast.mdamb435 nsclc.hop92 breast.mdamb231 colon.ht29 breast.hs578t breast.bt549 cns.sf539 melanoma.loximvi cns.snb75 cns.sf295 renal.sn12c cns.sf268 cns.u251 cns.snb19 nsclc.h460 0.6 0.4 0.2 0.0
Chromosome 16 cns.sf539 0.8 breast.bt549 nsclc.h522 nsclc.h23 cns.sf268 breast.hs578t 0.6 renal.sn12c breast.mdamb231 nsclc.hop92 cns.sf295 0.4 leukemia.rpmi8226 leukemia.k562 prostate.pc3 melanoma.loximvi ovarian.4 0.2 ovarian.5 ovarian.3 nsclc.a549 nsclc.h322 0.0 nsclc.ekvx cns.u251 melanoma.malme3m melanoma.m14 breast.mdan breast.mdamb435 melanoma.uacc62 melanoma.skmel28 melanoma.skmel5 melanoma.uacc577 renal.uo31 nsclc.hop62 melanoma.skmel2 cns.snb75 cns.snb19 renal.a498 renal.rxf393 renal.achn renal.caki1 renal.786o renal.tk10 breast.unknown ovarian.8 ovarian.skov3 nsclc.h226 nsclc.h460 breast.mcf7 ovarian.igrov1 colon.hct116 prostate.du145 breast.t47d leukemia.ccrfcem leukemia.molt4 leukemia.hl60 colon.sw620 colon.colo205 colon.hcc2998 colon.km12 colon.hct15 leukemia.srcl7019 colon.ht29 0.6 0.4 0.2 0.0
Can cancers be distinguished by genes of one function? • Table for functional categories looks a lot like the table for chromosomes • Some biological process categories (signal transduction, cell proliferation, cell cycle, protein metabolism) can distinguish many types of cancer • Others (apoptosis, energy pathways) cannot
cell surface receptor linked signal transduction nsclc.h23 0.8 colon.ht29 leukemia.rpmi8226 leukemia.k562 leukemia.hl60 0.6 colon.sw620 leukemia.ccrfcem leukemia.molt4 ovarian.igrov1 leukemia.srcl7019 0.4 ovarian.3 colon.hct116 breast.t47d colon.hcc2998 0.2 colon.hct15 colon.colo205 colon.km12 melanoma.malme3m melanoma.skmel2 0.0 melanoma.uacc577 breast.mdan breast.mdamb435 melanoma.skmel28 melanoma.m14 melanoma.uacc62 melanoma.skmel5 cns.sf295 breast.mcf7 nsclc.h522 nsclc.h460 ovarian.5 nsclc.hop62 prostate.pc3 cns.snb75 ovarian.skov3 renal.a498 renal.caki1 renal.uo31 renal.achn renal.786o renal.rxf393 renal.tk10 nsclc.hop92 ovarian.4 nsclc.h226 breast.bt549 nsclc.ekvx nsclc.h322 nsclc.a549 prostate.du145 melanoma.loximvi breast.unknown ovarian.8 renal.sn12c cns.sf268 breast.mdamb231 cns.sf539 breast.hs578t cns.u251 cns.snb19 0.6 0.4 0.2 0.0
protein metabolism and modification cns.sf268 0.8 breast.bt549 nsclc.hop92 cns.u251 cns.snb19 cns.sf295 0.6 cns.snb75 nsclc.hop62 breast.hs578t cns.sf539 0.4 nsclc.h226 melanoma.loximvi melanoma.uacc62 melanoma.skmel28 0.2 melanoma.malme3m melanoma.skmel5 melanoma.skmel2 melanoma.uacc577 nsclc.h460 0.0 nsclc.h23 nsclc.a549 nsclc.ekvx nsclc.h322 breast.t47d breast.mcf7 prostate.du145 breast.unknown ovarian.8 ovarian.3 ovarian.igrov1 ovarian.5 ovarian.skov3 prostate.pc3 ovarian.4 renal.uo31 renal.caki1 renal.tk10 renal.achn renal.786o renal.rxf393 renal.a498 colon.colo205 colon.ht29 colon.sw620 colon.hcc2998 colon.hct15 colon.km12 colon.hct116 leukemia.ccrfcem leukemia.molt4 leukemia.rpmi8226 leukemia.hl60 leukemia.k562 renal.sn12c breast.mdamb231 nsclc.h522 breast.mdan breast.mdamb435 melanoma.m14 leukemia.srcl7019 0.2 0.0 0.6 0.4
Recommend
More recommend