problem
play

Problem and model selection and model selection Elisabeth - PDF document

False discovery rate False discovery rate Problem and model selection and model selection Elisabeth Gnatowski Elisabeth Gnatowski Definition of the FDR Definition of the FDR Multiple Testing Multiple Testing FDR and pFDR FDR and pFDR


  1. False discovery rate False discovery rate Problem and model selection and model selection Elisabeth Gnatowski Elisabeth Gnatowski Definition of the FDR Definition of the FDR Multiple Testing Multiple Testing FDR and pFDR FDR and pFDR Controlling the FDR Controlling the FDR Estimation of the FDR Estimation of the FDR Gene - specific FDR Gene - specific FDR find differentially expressed genes using DNA microarrays False discovery rate and model selection Variable Selection Variable Selection A decision theoretic A decision theoretic number of genes much larger than number of independent framework framework Simulation studies Simulation studies samples in study ( p >> n ) Elisabeth Gnatowski p < n p < n p > n p > n problem of testing multiple hypotheses simultaneously analysing microarray data requires control of type 1 errors 23.06.2006 including balance between finding too many false-positive results and too little significant results ⇒ FDR False discovery rate False discovery rate Multiple Testing and model selection and model selection Elisabeth Gnatowski Elisabeth Gnatowski Definition of the FDR 1 Definition of the FDR Definition of the FDR Multiple Testing Multiple Testing Multiple Testing FDR and pFDR FDR and pFDR FDR and pFDR Controlling the FDR Controlling the FDR Controlling the FDR Estimation of the FDR Estimation of the FDR Gene - specific FDR Gene - specific FDR Variable Selection Variable Selection Estimation of the FDR 2 A decision theoretic A decision theoretic framework framework Gene - specific FDR Simulation studies Simulation studies p < n p < n p > n p > n Variable Selection 3 Testing m Hypothesis, for m 0 of them, the null is true A decision theoretic framework 4 H 0 : gene is not differentially expressed V is equivalent to type 1 error, false-positive results Simulation studies 5 T is equivalent to type 2 error, false-negative results p < n W number of not rejected hypothesis, p > n R number of rejected hypothesis False discovery rate False discovery rate FDR and pFDR (positive false discovery rate) Controlling the FDR and model selection and model selection Elisabeth Gnatowski Elisabeth Gnatowski Definition of the FDR Definition of the FDR Multiple Testing Multiple Testing FDR and pFDR FDR and pFDR Benjamini and Hochberg (1995) propose a algorithm for selecting Controlling the FDR Controlling the FDR the hypotheses that are significant that controls the FDR: Estimation of the FDR Estimation of the FDR Gene - specific FDR Gene - specific FDR Variable Selection Variable Selection A decision theoretic let H 1 , . . . , H G denote the null hypotheses to be tested, and A decision theoretic framework framework p 1 ≤ p 2 ≤ . . . ≤ p G denote the corresponding, ordered, Simulation studies Simulation studies independent p-values p < n p < n p > n p > n expected rate of false-positive results of all positive results let α denote the rate at which it is desired to control the FDR � V for selecting significant hypotheses first define level α and � � E falls R > 0 � V � find ˆ FDR = R falls R = 0 = E R | R > 0 P ( R > 0) � 1 ≤ k ≤ G : p k ≤ αk � k = max G 0 reject all null hypotheses with indizes 1 , . . . , k if P ( R = 0) > 0 → Definition of FDR is useless → pFDR strong control of the FDR at level α when the p-values are � V � independent and uniformly distributed pFDR = E R | R > 0 rate at which discoveries are false False discovery rate False discovery rate Basics and model selection and model selection Elisabeth Gnatowski Elisabeth Gnatowski Definition of the FDR 1 Definition of the FDR Definition of the FDR Multiple Testing Multiple Testing Multiple Testing FDR and pFDR FDR and pFDR FDR and pFDR Estimating the FDR by estimating π 0 (which is the rate of Controlling the FDR Controlling the FDR Controlling the FDR Estimation of the FDR the true null hypothesis) and the joint distribution of the p - Estimation of the FDR Gene - specific FDR Gene - specific FDR values Variable Selection Variable Selection Estimation of the FDR 2 the p - values of the true null hypothesis are uniformly A decision theoretic A decision theoretic framework framework Gene - specific FDR distributed on the interval [0 , 1] Simulation studies Simulation studies p < n Theorem from Bayes: p < n p > n p > n Variable Selection 3 f ( x | θ ) g ( θ ) π ( θ | x ) = � f ( x | θ ) g ( θ ) dθ A decision theoretic framework 4 π ( θ | x ) posteriori distribution g ( θ ) priori distribution Simulation studies 5 f ( x | θ ) joint distribution p < n p > n sampling from posteriori distribution by MCMC

Recommend


More recommend