Integration of Genetic and Integration of Genetic and Genomic Approaches for the Genomic Approaches for the Analysis of Chronic Fatigue Analysis of Chronic Fatigue Syndrome Implicates Syndrome Implicates Forkhead Box N1 Box N1 Forkhead Angela Presson, Jeanette Papp, Eric Sobel, and Steve Horvath Biostatistics and Human Genetics University of California, Los Angeles CAMDA 2006 1
CAMDA 2006 Challenge CAMDA 2006 Challenge DNA Level : ~ 50 Pre-selected SNP’s 2. Relate SNP data to Expression data mRNA Level : ~ 20K genes/array 1. Relate Expression data to Clinical Trait data Organism Level : ~ 70 Clinical Traits 3. Integrate results to find CFS relevant genes. CAMDA 2006 2
Analysis Overview Analysis Overview 1. Construct gene co- -expression network expression network from from 1. Construct gene co microarray data. (Zhang and Horvath 2005) data. (Zhang and Horvath 2005) microarray 2. Identify module of interest using trait data. using trait data. 2. Identify module of interest 3. Determine informative SNP’s SNP’s and relate them to and relate them to 3. Determine informative gene co- -expression network. expression network. gene co 4. Identify genes with statistical and biological with statistical and biological 4. Identify genes significance. significance. 5. Choose subset of CFS and control samples for for 5. Choose subset of CFS and control samples validating the candidate biomarker. validating the candidate biomarker. CAMDA 2006 3
Network = Adjacency Matrix Network = Adjacency Matrix A network can be represented by an adjacency matrix, A=[a ij ] , that encodes connection strength between a pair of genes. • Two genes have high connection strength if they have similar expression patterns. • A is a symmetric matrix with entries in [0,1]. • Two Network Models: • Unweighted: a ij = 1 if two genes are adjacent (connected) and 0 otherwise. • Weighted: each a ij gives the connection strength between gene pairs. CAMDA 2006 4
Important Task in Important Task in Many Genomic Applications: Many Genomic Applications: Given a network (pathway) of Given a network (pathway) of interacting genes how to find the interacting genes how to find the central players? central players? CAMDA 2006 5
Identifying Key Players of Interest Identifying Key Players of Interest Imagine you wanted to recruit students to your science program. Popularity alone might suggest the head cheerleader or quarterback. Head Cheerleader Star Quarterback CAMDA 2006 6
But, the head of the chess club But, the head of the chess club would probably be a better bet! would probably be a better bet! Chess Club President Cheerleader Quarterback CAMDA 2006 7
Two Network Definitions Two Network Definitions 1. Number of friends = “ Connectivity ” • Gene connectivity = row sum of the adjacency matrix, sum of gene i ’s connection strengths. = ∑ k a i ij j 2. Chess Club, Sport Teams = “ Modules ” • Gene Module = cluster of highly connected (similarly expressed) genes in a network. CAMDA 2006 8
Gene connectivity vs. Module connectivity Gene connectivity vs. Module connectivity • Whole network connectivity • Intra-modular connectivity • Whole network • Connectivity within a connectivity is largely module is biologically & driven by the size of the mathematically more module containing the meaningful than whole gene. network connectivity. CAMDA 2006 9
Analysis Overview Analysis Overview 1. Construct gene co Construct gene co- -expression network expression network 1. from microarray microarray data. data. (Zhang and Horvath 2005) from (Zhang and Horvath 2005) 2. Identify module of interest using trait data. using trait data. 2. Identify module of interest 3. Determine informative SNP’s SNP’s and relate them to and relate them to 3. Determine informative gene co- -expression network. expression network. gene co 4. Identify genes with statistical and biological with statistical and biological 4. Identify genes significance. significance. 5. Choose subset of CFS and control samples for for 5. Choose subset of CFS and control samples validating the candidate biomarker. validating the candidate biomarker. CAMDA 2006 10
Revisiting the Adjacency Matrix Revisiting the Adjacency Matrix Connection Strength (Adjacency) vs. Correlation • Step function (hard thresholding) is indicated by the black, solid line. Adjacency • Adjacency a ij = |cor(gene i , gene j )| β . • Power adjacency functions (soft thresholding) are indicated by colored, dashed lines. |Correlation| Once we found an appropriate β (according to methodology outlined in Zhang and Horvath 2005) we found that our network results were robust to small changes in β . CAMDA 2006 11
Four Modules Identified Using Four Modules Identified Using Hierarchical Clustering Hierarchical Clustering Brown Red Turquoise Green • Grey colors indicate genes outside of any module. • MDS plot indicates clear separation of brown, green, turquoise modules. CAMDA 2006 12
Analysis Overview Analysis Overview 1. Construct gene co- -expression network expression network from from 1. Construct gene co microarray data. data. (Zhang and Horvath 2005) microarray (Zhang and Horvath 2005) 2. Identify module of interest Identify module of interest using trait data. using trait data. 2. 3. Determine informative SNP’s SNP’s and relate them to and relate them to 3. Determine informative gene co- -expression network. expression network. gene co 4. Identify genes with statistical and biological with statistical and biological 4. Identify genes significance. significance. 5. Choose subset of CFS and control samples for for 5. Choose subset of CFS and control samples validating the candidate biomarker. validating the candidate biomarker. CAMDA 2006 13
A clinical trait gives rise to a A clinical trait gives rise to a “Trait Significance” measure “Trait Significance” measure TraitSignificance(i) = |cor(x(i), TRAIT)| where x(i) is the gene expression profile of the i th gene. Module Trait Significance = Average(Trait Significance values for genes in a module). CAMDA 2006 14
Trait Significance Results Trait Significance Results • Table shows average trait significance for each module. • Every module was characterized in terms of a group of clinical traits. • Interested in CFS severity trait “CLUSTER” because it contained the information from 14 clinical traits (evaluation responses). • Focused on the green module (184 genes) since it was related to the CLUSTER trait. ◄ ◄ CAMDA 2006 15
Analysis Overview Analysis Overview 1. Construct gene co- -expression network expression network from from 1. Construct gene co microarray data. data. (Zhang and Horvath 2005) microarray (Zhang and Horvath 2005) 2. Identify module of interest using trait data. using trait data. 2. Identify module of interest 3. Determine informative Determine informative SNP’s SNP’s and relate and relate 3. them to gene co- -expression network. expression network. them to gene co 4. Identify genes with statistical and biological with statistical and biological 4. Identify genes significance. significance. 5. Choose subset of CFS and control samples for for 5. Choose subset of CFS and control samples validating the candidate biomarker. validating the candidate biomarker. CAMDA 2006 16
Finding SNPs SNPs associated associated Finding with the CLUSTER trait with the CLUSTER trait We chose two SNPs with highest CLUSTER correlation. • SNP12 = hCV245410 on 12q21 (p-value = 0.01) • SNP17 = hCV7911132 on 17q21 (p-value = 0.001) SNP & Cluster Correlation P-Values 8 hCV7911132, 17q21 7 6 5q34 -Log(P-Value) 2p24 5 hCV245410, 12q21 7p15 11p15 4 12q21 17q 3 22q11.1 X 2 1 0 SNP's Colored by Chromosome CAMDA 2006 17
Correlation with relevant SNPs SNPs defines defines Correlation with relevant th gene SNP Significance of the i i th gene SNP Significance of the SNPSignificance = |cor(x(i), SNP)| (Where SNP data is additively coded). • Conceptually related to a LOD* score at the SNP marker for the i th gene expression. • Why correlate SNP and gene expression data? • Puts SNP effect on the same footing as trait effect and gene-gene connection strengths. Effect sizes are important in our analysis. *LOD = “logarithmic odds”, a traditional measure of linkage between genetic loci. CAMDA 2006 18
SNP Filtering & Significance Results SNP Filtering & Significance Results • Table shows the average SNP significance for each module. • Green module genes most correlated with SNP12. • “SNP12 – Sub-sample” = average module correlations with SNP12 among samples that have a particular SNP12 and SNP17 genotype. • Higher correlation(green module,SNP12) in the sample subset. Module SNP Significance (Standard Error) SNPs Turquoise Grey Red Brown Green SNP12 0.052 (0.002) 0.077 (0.001) 0.036 (0.004) 0.091 (0.004) 0.128 (0.004) SNP17 0.056 (0.002) 0.064 (0.001) 0.045 (0.005) 0.039 (0.003) 0.04 (0.002) SNP12 Sub-sample 0.128 (0.005) 0.144 (0.002) 0.067 (0.009) 0.203 (0.007) 0.186 (0.007) CAMDA 2006 19
Recommend
More recommend