sequential estimation in the group testing
play

Sequential Estimation in the Group Testing Yaakov Malinovsky - PowerPoint PPT Presentation

Sequential Estimation in the Group Testing Yaakov Malinovsky University of Maryland, Baltimore County Joint work with Gregory Haber (UMBC) and Paul Albert (NCI) QPRC 2017 The 34th Quality and Productivity Research Conference Department of


  1. Sequential Estimation in the Group Testing Yaakov Malinovsky University of Maryland, Baltimore County Joint work with Gregory Haber (UMBC) and Paul Albert (NCI) QPRC 2017 The 34th Quality and Productivity Research Conference Department of Statistics, University of Connecticut June 13, 2017 Y. Malinovsky (UMBC) Estimation in the Group Testing 1 / 22

  2. Group Testing for the Estimating Prevalence Rate An early example of group testing to estimate the prevalence of a trait is due to Marion A. Watson (1936). In this example, aphids are grouped on to potential host plants and observations are made on the subsequent development of disease transmitted by the aphids. The maximum likelihood estimator (MLE) indicates that the probability of disease transmission was about 0 . 05 − 0 . 15. Watson M. A. (1936). Factors Affecting the Amount of Infection Obtained by Aphis Transmission of the Virus Hy. III. Trans. Roy. Soc. London, Ser. B. 226, 457–489. Y. Malinovsky (UMBC) Estimation in the Group Testing 2 / 22

  3. Probabilistic Model Let members of a population be represented by independent random variables ϕ i ∼ Bernoulli ( p ) , i = 1 , 2 , 3 , . . . , where p is the quantity we wish to estimate. For group tests with groups of size k , we have the new random variable ϑ ( k ) = max { ϕ i 1 , ϕ i 2 , . . . , ϕ i k } ∼ Bernoulli ( 1 − q k ) , i where q = 1 − p . Y. Malinovsky (UMBC) Estimation in the Group Testing 3 / 22

  4. Fisher Information Contains in One Observation � ∂ � 2 I ( θ ) = E θ ∂θ log p ( X , θ ) . I k ( p ) = k 2 q k 1 − q k , q = 1 − p . q 2 Y. Malinovsky (UMBC) Estimation in the Group Testing 4 / 22

  5. Example: Fisher Information Contains in One Observation Fisher Inforamtion 45 k=1 k=5 40 35 30 25 20 15 10 5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 p Y. Malinovsky (UMBC) Estimation in the Group Testing 5 / 22

  6. Fixed Sample Design: Model (a) We observe a random sample ϑ ( k ) 1 , ϑ ( k ) 2 , . . . , ϑ ( k ) n . n � n , 1 − q k � � ϑ ( k ) Define, X = ∼ Binomial . 1 i = 1 1 − q k MLE ( a ) ( X ) = X � n . � � 1 / k 1 − X � p MLE ( a ) ( X ) = 1 − . n Y. Malinovsky (UMBC) Estimation in the Group Testing 6 / 22

  7. Burrows Estimator- Model (a) An alternative estimator was proposed by Burrows (1987) which reduced the MLE bias of order 1 n . � � 1 / k � � n − X + a ⇒ a = b = k − 1 min a , b E 1 − − p = . n + b 2 k � � 1 / k , b k = k − 1 n X � p B ( a ) ( X ) = 1 − 1 − . n + b k n 2 k Burrows, P . M. (1987). Improved Estimation of Pathogen Transmission Rates by Group Testing. Phytopathology 77, 363–365. Y. Malinovsky (UMBC) Estimation in the Group Testing 7 / 22

  8. �� � Example: Relative Bias E p − p p Relative Bias %, n=10, k=5 70 MLE Burrows 60 50 40 30 20 10 0 −10 −20 0 0.1 0.2 0.3 0.4 0.5 p Y. Malinovsky (UMBC) Estimation in the Group Testing 8 / 22

  9. Example: MSE MSE, n=10, k=5 0.2 MLE Burrows 0.18 Individual 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 p Y. Malinovsky (UMBC) Estimation in the Group Testing 9 / 22

  10. Binomial Sampling Plans S All plans begin at the origin and, until a point γ = ( X ( γ ) , Y ( γ )) ∈ B S (set of boundary points) is reached, the X or Y coordinate is increased iteratively by one with probability θ or 1 − θ respectively. The boundary point γ ∈ B S at which sampling stops is a sufficient statistic for θ . For each such point γ , we define N S ( γ ) = Y ( γ ) + X ( γ ) . An important characteristic of any plan then will be E ( N S ) . If N S ( γ ) = n for some positive integer n and all γ ∈ B S , then S is a fixed binomial sampling plan. If N S ( γ ) < M for some positive integer M and all γ ∈ B S , then S is a finite binomial sampling plan. Girshick, M. A., Mosteller, F., and Savage, L. J. (1946). Unbiased Estimates for Certain Binomial Sampling Problems with Applications. Annals of Mathematical Statistics 1 7, 13–23. Y. Malinovsky (UMBC) Estimation in the Group Testing 10 / 22

  11. Unbiased Estimator under Finite Sampling Plans Result Let F be the set of all finite binomial sampling plans with probability of success θ , and k any positive integer greater than one. Then, there does not exist an estimator f under any sampling plan F ∈ F such that f is an unbiased estimator of θ 1 / k or ( 1 − θ ) 1 / k . For the group testing problem, where θ = 1 − ( 1 − p ) k or θ = ( 1 − p ) k , it follows immediately that the non-existence of an unbiased estimator of p extends to this broader class of sampling plans as well. Remark : A randomized binomial sampling scheme to estimating a function of the form p α , α > 0 is presented in Banerjee and Sinha (1979). Banerjee, P . K. and Sinha, B. K. (1979). Generating an Event with Probability p α , α > 0. Sankhy¯ a, Series B 4 1, 282–285. Y. Malinovsky (UMBC) Estimation in the Group Testing 11 / 22

  12. Inverse Binomial Sampling: Models (b) and (c) Model (b) Sample the groups ϑ ( k ) 1 , ϑ ( k ) 2 , . . . until the c positive groups. Model (c) Sample the groups ϑ ( k ) 1 , ϑ ( k ) 2 , . . . until the c negative groups. Y. Malinovsky (UMBC) Estimation in the Group Testing 12 / 22

  13. DeGroot (1959) Result Result � c + w − 1 � θ c ( 1 − θ ) w . Let W ∼ NB ( c , θ ) : P ( W = w ) = w Then, a function h ( θ ) is estimable unbiasedly if and only if it can be expanded in a Taylor series on the interval | θ | < 1 . If h ( θ ) is estimable unbiasedly, then its unique estimator is given by � � d w ( c − 1 )! h ( θ ) ˆ h ( w ) = , w = 0 , 1 , 2 , . . . . d θ w ( 1 − θ ) c ( w + c − 1 )! θ = 0 Degroot, M. H. (1959). Unbiased Sequential Estimation for Binomial Populations. Annals of Mathematical Statistics 3 0, 80–101. Y. Malinovsky (UMBC) Estimation in the Group Testing 13 / 22

  14. Construction of Unbiased Estimator: Model (c) B = { γ : Y ( γ ) = c } . Define X to be the number of positive groups prior to this event: � c + x − 1 � ( q k ) c ( 1 − q k ) x , x = 0 , 1 , 2 , . . . P ( X = x ) = x θ = 1 − q k and want to estimate h ( θ ) = ( 1 − θ ) 1 / k = q . � 0 , x = 0 , � � ˆ 1 − � x p D ( c ) ( x ) = j + c − 1 − 1 / k , x = 1 , 2 , 3 , . . . . j = 1 j + c − 1 Y. Malinovsky (UMBC) Estimation in the Group Testing 14 / 22

  15. Example E ( N DeGroot ) = c q k . MSE, c=10, k=5 −3 x 10 2 MLE Burrows 1.8 DeGroot 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 p Y. Malinovsky (UMBC) Estimation in the Group Testing 15 / 22

  16. Model (b): No Unbiased Estimator B = { γ : X ( γ ) = c } . Define Y to be the number of negative groups prior to this event: � c + y − 1 � ( 1 − q k ) c ( q k ) y , y = 0 , 1 , 2 , . . . P ( Y = y ) = y We have θ = q k so that h ( θ ) = θ 1 / k = q . However, h does not have a Taylor expansion at the point θ = 0. Therefore, by Degroot’s Theorem no unbiased estimator exists under this model. Y. Malinovsky (UMBC) Estimation in the Group Testing 16 / 22

  17. Extension of Burrows to Models (b) and (c) We extend the idea of Burrows in the fixed sampling case to the sequential models discussed here, with the modification that we seek to remove terms of order O ( 1 / E [ N ]) from the bias. Model (b) � � 1 / k y + b k , b k = k − 1 ˆ p B ( b ) ( y ) = 1 − . y + c + b k − 1 2 k Model (c) � � 1 / k c + b k − 1 , b k = k − 1 ˆ p B ( c ) ( x ) = 1 − . x + c + b k − 1 2 k Y. Malinovsky (UMBC) Estimation in the Group Testing 17 / 22

  18. Model (c): Relative Bias Relative Bias %, c=10, k=5 1 MLE Burrows 0 −1 −2 −3 −4 −5 −6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 p Y. Malinovsky (UMBC) Estimation in the Group Testing 18 / 22

  19. Model (c): MSE −3 MSE, c=10, k=5 x 10 2 MLE Burrows 1.8 DeGroot 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 p Y. Malinovsky (UMBC) Estimation in the Group Testing 19 / 22

  20. Numerical Comparisons We present comparisons based on MSE. Comparisons can be challenging due to the number of variables which must be considered (including p , E ( N ) , and k ). To deal with this, we considered p and E ( N ) fixed and then chose the value of k ∈ { 2 , . . . , 50 } for each estimator which yields the smallest MSE. Y. Malinovsky (UMBC) Estimation in the Group Testing 20 / 22

  21. MSE Comparisons for E ( N ) = 25 (10000 × MSE ) ˆ p \ p 0 . 01 0 . 05 0 . 1 0 . 2 0 . 3 0 . 5 ˆ 0 . 1119 1 . 9982 7 . 3243 24 . 5901 46 . 1621 82 . 4696 p MLE ( a ) ˆ p MLE ( b ) 1 . 3059 4 . 9489 12 . 8547 38 . 6643 62 . 1209 101 . 5301 ˆ p MLE ( c ) 0 . 1010 1 . 6105 6 . 0341 22 . 6446 43 . 7033 96 . 6345 ˆ p B ( a ) 0 . 1039 1 . 6010 3 . 6165 13 . 2301 26 . 7432 56 . 3798 ˆ p B ( b ) 0 . 1477 1 . 5911 4 . 8515 17 . 2066 33 . 6451 64 . 2978 ˆ p B ( c ) 0 . 1046 1 . 6237 6 . 1142 22 . 8252 42 . 7642 90 . 0256 ˆ p D ( c ) 0 . 1046 1 . 6230 6 . 1124 22 . 8217 42 . 7695 90 . 0741 Y. Malinovsky (UMBC) Estimation in the Group Testing 21 / 22

  22. Thank you! Y. Malinovsky (UMBC) Estimation in the Group Testing 22 / 22

Recommend


More recommend