Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko Beerenwinkel
Gene ranking Goal: Identify (or prioritize) genes that affect readout, i.e., are involved in biological process of interest i l d i bi l i l f i t t Issues Noise (readout, siRNA specificity) Design (siRNA library, replicates, validation screens) Limited resources Procedure Normalization: quantiles z-score error models Normalization: quantiles, z score, error models Rank by normalized readout or p-value 2
RSA: Redundant siRNA activity analysis Rank all siRNAs (wells) by readout Assign p value to each gene based on the rank distribution of all Assign p-value to each gene based on the rank distribution of all siRNAs targeting it (hypergeometric model) König et al, 2007 3
Comparing gene rankings Intersection metric Spearman’s footrule 4
Stable variables Let Λ be the set of all (reasonable) cut-offs for a given ranking (i.e., λ ∈ Λ is a regularization parameter). λ ∈ Λ is a regularization parameter) ranking (i e The set of selected genes S λ = ˆ S λ ( I ) ˆ i is a function of the samples I . f ti f th l I For a given threshold π , the set of stable variables is ½ ½ ¾ ¾ ³ ³ S λ ´ ´ S stable = ˆ k ∈ ˆ ≥ π k : max λ ∈ Λ P λ ∈ Λ P can be estimated by sub- or re-sampling P can be estimated by sub- or re-sampling. 5
Stability selection (Meinshausen & B ühlmann, 2010) Under certain assumptions, the expectation of the number of falsely selected variables V is bounded by f f l l l t d i bl V i b d d b q 2 1 1 q Λ Λ E( V ) ≤ ( ) 2 π − 1 p where p is the total number of genes, and h h i i | ∪ λ ∈ Λ S λ ( I ) | ˆ λ ( ) | q Λ = E | the expected number of stable genes the expected number of stable genes. In practice we can set π and Λ to control false positives In practice, we can set π and Λ to control false positives. 6
Data sets Hardt lab Salmonella screen in human cells S l ll i h ll 19,000 genes Read-out: infection rate Read out: infection rate ~4 different siRNAs per gene, no replicates Merdes lab (Saj et al., Dev Cell, 2010) Notch screen in Drosophila 12,000 genes Read-out: Notch activity 4 replicates 4 replicates Secondary and in vivo validation screens 7
Salmonella screen: ranking Quantile normalization Rankings (Kendall’s tau distance): 8
9 Λ Λ Salmonella screen: stability Λ Λ
10 Notch screen: raw data
11 Notch screen: quantile normalization
Notch screen: normalization Raw data cor Quantiles Quantiles rrelation Z-scores 12
Notch screen: ranking Quantile-normalized Ranking distance (Kendall’s tau) 13
Notch screen: reproducibility Leave-one-out: R Ranking based on ki b d three replicates validated with fourth validated with fourth replicate Cut-off 300 for C t ff 300 f validation 14
Notch screen: average leave-one-out ROC curves for different normalizations different normalizations 15
Notch screen: ROC analysis of validation screen Secondary screen of 900 genes Focus on down regulation All 12 000 All 12,000 genes T Top 254 genes 254 16
Notch screen: stability, in vivo validation Median ranking of top 2000 genes Λ Median 20 17
Conclusions Both quantile and z-score normalzation improve correlation and reproducibility. d d ibilit Selecting stable genes complements selection of high- scoring genes. i Stable sets quantify reproducibility of being among top k in ranking Upper bound on expected number of false positives RSA produced fairly unstable sets 18
Acknowledgements Computational Biology Group, www.cbg.ethz.ch Juliane Siebourg J li Si b Edgar Delgado-Eckert C ll b Collaborators Gunter Merdes (D-BSSE, ETH Zurich) Wolf-Dietrich Hardt (D-BIOL, ETH Zurich) InfectX consortium Funding InfectX , SystemsX.ch 19
Recommend
More recommend