february 2008 differential expression power exploratory
play

February 2008 Differential Expression, Power, Exploratory Analysis - PowerPoint PPT Presentation

February 2008 Differential Expression, Power, Exploratory Analysis Mauro Delorenzi Bioinformatics Core Facility (BCF) NCCR molecular oncology Swiss Institute of Bioinformatics (SIB) <Mauro.Delorenzi@isb-sib.ch> () February 10, 2008 2 /


  1. February 2008 Differential Expression, Power, Exploratory Analysis Mauro Delorenzi Bioinformatics Core Facility (BCF) NCCR molecular oncology Swiss Institute of Bioinformatics (SIB) <Mauro.Delorenzi@isb-sib.ch> () February 10, 2008 2 / 77

  2. Differential gene expression Microarray differential gene expression Figure: Source: Roche Gene expression (GE) with microarrays: fluorescence intensity signal proportional (in some range) to the amount of target RNA hybridized to the array () February 10, 2008 3 / 77

  3. Differential gene expression Microarray differential gene expression GE quantified in a log scale, usually base 2 Compare two conditions 1 and 2: (1) M = GE 2 − GE 1 1 unit of M: factor 2 M=2: factor 4 M=-3: factor 1 / 8 M=u: factor 2 u How precisely does GE measure the real gene expression level; How precisely does M measure the real log ratio? () February 10, 2008 4 / 77

  4. Differential gene expression () February 10, 2008 5 / 77

  5. Hypothesis Testing, Classification Statistical testing Ex.: comparing the mean value of a variable between two groups Quantification of an effect: difference of the means = µ Null hypothesis H 0 µ = 0 vs. Alternative hypothesis H A µ � 0 Test statistic: t-statistic for µ = 0 measures the effect observed in an experiment Define significance level α , typical value 0.05 () February 10, 2008 6 / 77

  6. Hypothesis Testing, Classification Define rejection region Assume the t statistics is follows a student-t distribution and reject when abs(t) > t 0 , t 0 determined so that under H 0 P [ t < - t 0 ] = α / 2 and P [ t > t 0 ] = α / 2 Take a sample, calculate t, decide () February 10, 2008 7 / 77

  7. Hypothesis Testing, Classification p values The p-value is the probability that the effect, the test statistic t 1 observed is at least as large as that observed in the sample, under H 0 : One sided test: p = P [ t > t 1 ] Two sided test: p = P [ abs ( t ) > t 1 ] Figure: p value (one-sided test): area beyond the observed value () February 10, 2008 8 / 77

  8. Hypothesis Testing, Classification A p-value without a null hypothesis is meaningless A p-value below α is called significant and corresponds to a rejection of H 0 . A (highly) significant p-value means that the observed difference vould occur (very) rarely if truly there were no difference, if H 0 were correct but It does not prove that the alternative hypothesis is true. It does not express the strength of confidence in H A . It is based on the assumption of H 0 and therefore can only quantify evidence against the null hypothesis. () February 10, 2008 9 / 77

  9. Hypothesis Testing, Classification A (highly) nonsignificant (large) p value does NOT demonstrate that the null hypothesis is true, only that the evidence against H 0 is not (very) strong, not convincing enough for a rejection. Large p values might be due to small sample sizes. The p value does not quantify the strength (effect size) of the observed difference. The same difference of two means can correspond to very different p-values. A small effect in a very large study can easily lead to a significant p-value, a large effect in a small study might well not give significance. () February 10, 2008 10 / 77

  10. Hypothesis Testing, Classification Decision errors The findings of a study comparing two groups can be wrong in two ways 1. The results lead to the conclusion that there is a difference when in reality there is none: FP error, type I error, controlled by α 2. The results lead to the conclusion that there is no difference when in reality a difference exists FN error, type II error, expressed by β α is the benchmark for the p-values, if the p-value is below the threshold α the results is called statistically significant The power of a study is the probability of finding a statistically significant result, if there is a true difference. Power = 1 - β . The choice of adequate power is critical, because investigators and funding agencies must be confident that an existing difference can be detected using the study sample. A study that has little chance to reaching this aim might make no sense. () February 10, 2008 11 / 77

  11. Hypothesis Testing, Classification () February 10, 2008 12 / 77

  12. Hypothesis Testing, Classification Classification Example Test on pregnancy (the condition). A test result, which does suggest a pregnancy is called positive, the opposite is a negative result. If 200 cases with the condition are tested and the test has a sensitivity of 90%, in 180 of these cases the test result is positive. If 1200 cases without the condition are tested and the test has a sensitivity of 95%, in 1140 of these cases the test result is positive. Real status vs Test outcome Test Positive Test Negative Sum Condition present 180 20 200 Condition absent 60 1140 1200 Sum 240 1160 1400 () February 10, 2008 13 / 77

  13. Hypothesis Testing, Classification Classification, Terms Real status vs Test outcome Test Positive Test Negative Sum Condition present True Positive TP False Negative FN C Condition absent False Positive FP True Negative TN A Sum P N Total M Prevalence (PREV): Frequency of the condition in the population = C / M Sensitivity (SENS): Ability (Power) to diagnose presence of the condition = TP / C Specificity (SPEC): Ability to diagnose absence of the condition = TN / A Accuracy (ACC): Proportion of cases classified correctly by the test = ( TP + TN ) / M Error rate (ER): = ( FP + FN ) / M = 1 - Accuracy () February 10, 2008 14 / 77

  14. Hypothesis Testing, Classification Positive predictive value (PPV): Proportion among the classified positive that really does have the condition = TP / P False discovery proportion (FDP): Proportion among the classified positive that in reality does not have the condition = FP / P = 1 − PPV Negative predictive value (NPV): Proportion among the classified negative that really does not have the condition = TN / N () February 10, 2008 15 / 77

  15. Hypothesis Testing, Classification Multiple testing Aim: avoid false decisions, reduce their frequency In the same experiment testing N null hypothesis at level α If the nulls are all correct and the tests independent will wrongly reject at least one with probability P = 1 − ( 1 − α ) N On average will wrongly reject N α of the null hypotheses. N Prob Exp 10 40% 0.5 50 92% 2.5 90 99% 4.5 This would create much confusion, contradictions between different studies. () February 10, 2008 16 / 77

  16. Hypothesis Testing, Classification Differential Expression Multiple testing (MT) M=20,000 genes, α = 0.05 When no gene is DE ( H 0 true for all): expect FP = M * α = 1,000 genes One approach to MT: make test more stringent, adapt α Bonferroni method: use α / M = 2.5E-6 (or keep α , multiply p-values by M) Bonferroni gives Strong control: P [ FP ≥ 1 ] ≤ α Is conservative, when nb tests > 10 there might be little power left Is difficult when N large because small probabilities are not necessarily accurate () February 10, 2008 17 / 77

  17. Hypothesis Testing, Classification Multiple testing Many other approaches Control false discovery rate FDR (the expected proportion of false discoveries amongst the rejected hypotheses) The false discovery rate is a less stringent condition than the family wise error rate, so these methods are more powerful than the others. A simple methods based on p-values: Benjamini-Hochberg (BH) FDR (in R: p.adjust) Multiple testing corrections with permutation methods Control false discovery rate FDP Pr ( FP / P ≥ γ ) ≤ α (the observed proportion of false discoveries amongst the rejected hypotheses) FDP relates directly to the information in the observed data Control false discovery counts FDC (number of false discoveries amongst the rejected hypotheses) Korn’s permutation method Korn et al. (2001) Control Pr ( FP ≥ k ) ≤ α (when k = 0: FWER) () February 10, 2008 18 / 77

  18. Hypothesis Testing, Classification Benjamini-Hochberg (BH) FDR This correction is not as stringent as Bonferroni, accepts more false positives. There will be also less false negative genes. 1) The p-values of each gene are ranked from the smallest to the largest. 2) The largest p-value remains as it is. 3) The second largest p-value is multiplied by the total number of genes in gene list divided by its rank. Corrected p-value = p-value*(n/n-1) If less than 0.05, it is significant. 4) The third p-value is multiplied as in step 3: Corrected p-value = p-value*(n/n-2) and so on () February 10, 2008 19 / 77

  19. Hypothesis Testing, Classification Source: Pounds(2006) [4] () February 10, 2008 20 / 77

  20. Confidence Intervals and Power Confidence Intervals The observed value is affected by measurement error What is the real value? Observed value � real value, most of the time Confidence interval: a range of values (from a lower to an upper confidence limit) that will include the true parameter most of the time 95% CI: if we took an infinite number of sample of the same kind the true parameter would be included in the CI 95% of the time (coverage 95%) There remains a 5% chance that the true parameter is outside the CI Need multiple measurements to estimate the CI () February 10, 2008 21 / 77

  21. Confidence Intervals and Power () February 10, 2008 22 / 77

Recommend


More recommend