ranking candidate genes from ranking candidate genes from
play

Ranking candidate genes from Ranking candidate genes from - PowerPoint PPT Presentation

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko Beerenwinkel Gene ranking Goal: Identify (or prioritize) genes that affect readout, i.e., are involved in biological process of interest i l d i bi


  1. Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko Beerenwinkel

  2. Gene ranking  Goal: Identify (or prioritize) genes that affect readout, i.e., are involved in biological process of interest i l d i bi l i l f i t t  Issues  Noise (readout, siRNA specificity)  Design (siRNA library, replicates, validation screens)  Limited resources  Procedure  Normalization: quantiles z-score error models Normalization: quantiles, z score, error models  Rank by normalized readout or p-value 2

  3. RSA: Redundant siRNA activity analysis  Rank all siRNAs (wells) by readout   Assign p value to each gene based on the rank distribution of all Assign p-value to each gene based on the rank distribution of all siRNAs targeting it (hypergeometric model) König et al, 2007 3

  4. Comparing gene rankings  Intersection metric  Spearman’s footrule  4

  5. Stable variables  Let Λ be the set of all (reasonable) cut-offs for a given ranking (i.e., λ ∈ Λ is a regularization parameter). λ ∈ Λ is a regularization parameter) ranking (i e  The set of selected genes S λ = ˆ S λ ( I ) ˆ i is a function of the samples I . f ti f th l I  For a given threshold π , the set of stable variables is ½ ½ ¾ ¾ ³ ³ S λ ´ ´ S stable = ˆ k ∈ ˆ ≥ π k : max λ ∈ Λ P λ ∈ Λ  P can be estimated by sub- or re-sampling  P can be estimated by sub- or re-sampling. 5

  6. Stability selection (Meinshausen & B ühlmann, 2010)  Under certain assumptions, the expectation of the number of falsely selected variables V is bounded by f f l l l t d i bl V i b d d b q 2 1 1 q Λ Λ E( V ) ≤ ( ) 2 π − 1 p where p is the total number of genes, and h h i i | ∪ λ ∈ Λ S λ ( I ) | ˆ λ ( ) | q Λ = E | the expected number of stable genes the expected number of stable genes.  In practice we can set π and Λ to control false positives  In practice, we can set π and Λ to control false positives. 6

  7. Data sets  Hardt lab  Salmonella screen in human cells S l ll i h ll  19,000 genes  Read-out: infection rate Read out: infection rate  ~4 different siRNAs per gene, no replicates  Merdes lab (Saj et al., Dev Cell, 2010)  Notch screen in Drosophila  12,000 genes  Read-out: Notch activity  4 replicates  4 replicates  Secondary and in vivo validation screens 7

  8. Salmonella screen: ranking  Quantile normalization  Rankings (Kendall’s tau distance): 8

  9. 9 Λ Λ Salmonella screen: stability Λ Λ

  10. 10 Notch screen: raw data

  11. 11 Notch screen: quantile normalization

  12. Notch screen: normalization  Raw data cor  Quantiles Quantiles rrelation  Z-scores 12

  13. Notch screen: ranking  Quantile-normalized  Ranking distance (Kendall’s tau) 13

  14. Notch screen: reproducibility  Leave-one-out: R Ranking based on ki b d three replicates validated with fourth validated with fourth replicate  Cut-off 300 for C t ff 300 f validation 14

  15. Notch screen: average leave-one-out ROC curves for different normalizations different normalizations 15

  16. Notch screen: ROC analysis of validation screen  Secondary screen of 900 genes  Focus on down regulation All 12 000 All 12,000 genes T Top 254 genes 254 16

  17. Notch screen: stability, in vivo validation  Median ranking of top 2000 genes Λ Median 20 17

  18. Conclusions  Both quantile and z-score normalzation improve correlation and reproducibility. d d ibilit  Selecting stable genes complements selection of high- scoring genes. i  Stable sets quantify reproducibility of being among top k in ranking  Upper bound on expected number of false positives  RSA produced fairly unstable sets 18

  19. Acknowledgements  Computational Biology Group, www.cbg.ethz.ch  Juliane Siebourg J li Si b  Edgar Delgado-Eckert  C ll b Collaborators  Gunter Merdes (D-BSSE, ETH Zurich)  Wolf-Dietrich Hardt (D-BIOL, ETH Zurich)  InfectX consortium  Funding  InfectX , SystemsX.ch 19

Recommend


More recommend