selection detection and two sample testing generalized
play

Selection Detection and Two-Sample-Testing: Generalized Greenwood - PowerPoint PPT Presentation

Selection Detection and Two-Sample-Testing: Generalized Greenwood Statistics and their Applications an Daniel Erdmann-Pham, Jonathan Terhorst & Yun S. Song University of California, Berkeley July 9, 2019 SPA 2019 Motivation Framework


  1. Selection Detection and Two-Sample-Testing: Generalized Greenwood Statistics and their Applications Ðan Daniel Erdmann-Pham, Jonathan Terhorst & Yun S. Song University of California, Berkeley July 9, 2019 SPA 2019

  2. Motivation Framework Application Two Problems Generalized Greenwood Statistics

  3. Motivation Framework Application Population Genetics: Detecting Selective Pressure Neutral Tree ◮ At each depth, leaf set ◮ Leaf set sizes are highly sizes are approximately unbalanced close to the equidistributed root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes Generalized Greenwood Statistics

  4. Motivation Framework Application Population Genetics: Detecting Selective Pressure Neutral Tree ◮ At each depth, leaf set ◮ Leaf set sizes are highly sizes are approximately unbalanced close to the equidistributed root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes Generalized Greenwood Statistics

  5. Motivation Framework Application Population Genetics: Detecting Selective Pressure Neutral Tree Tree with Selection ◮ At each depth, leaf set ◮ Leaf set sizes are highly sizes are approximately unbalanced close to the equidistributed root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes Generalized Greenwood Statistics

  6. Motivation Framework Application Population Genetics: Detecting Selective Pressure Neutral Tree Tree with Selection ◮ At each depth, leaf set ◮ Leaf set sizes are highly sizes are approximately unbalanced close to the equidistributed root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes Generalized Greenwood Statistics

  7. Motivation Framework Application Population Genetics: Detecting Selective Pressure Neutral Tree Tree with Selection ◮ At each depth, leaf set ◮ Leaf set sizes are highly sizes are approximately unbalanced close to the equidistributed root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes Generalized Greenwood Statistics

  8. Motivation Framework Application Population Genetics: Detecting Selective Pressure Neutral Tree Tree with Selection ◮ At each depth, leaf set ◮ Leaf set sizes are highly sizes are approximately unbalanced close to the equidistributed root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes Generalized Greenwood Statistics

  9. Motivation Framework Application Population Genetics: Detecting Selective Pressure Neutral Tree Tree with Selection ◮ At each depth, leaf set ◮ Leaf set sizes are highly sizes are approximately unbalanced close to the equidistributed root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes Generalized Greenwood Statistics

  10.  ≠  ≠ Motivation Framework Application Two-Sample Tests: Comparing { X k } k ∈ [ n ] and { Y k } k ∈ [ m ] How to test the hypothesis whether { X k } and { Y k } are identi- cally distributed? Generalized Greenwood Statistics

  11.  ≠  ≠ Motivation Framework Application Two-Sample Tests: Comparing { X k } k ∈ [ n ] and { Y k } k ∈ [ m ] X k ~ Y k ( Null ) How to test the hypothesis whether { X k } and { Y k } are identi- cally distributed? Generalized Greenwood Statistics

  12. ≠ Motivation Framework Application Two-Sample Tests: Comparing { X k } k ∈ [ n ] and { Y k } k ∈ [ m ] X k ~ Y k ( Null )  [ X k ] ≠  [ Y k ] ( Alternative ) How to test the hypothesis whether { X k } and { Y k } are identi- cally distributed? Generalized Greenwood Statistics

  13. Motivation Framework Application Two-Sample Tests: Comparing { X k } k ∈ [ n ] and { Y k } k ∈ [ m ] X k ~ Y k ( Null )  [ X k ] ≠  [ Y k ] ( Alternative ) Var [ X k ] ≠ Var [ Y k ] ( Alternative ) How to test the hypothesis whether { X k } and { Y k } are identi- cally distributed? Generalized Greenwood Statistics

  14. Motivation Framework Application Two-Sample Tests: Comparing { X k } k ∈ [ n ] and { Y k } k ∈ [ m ] X k ~ Y k ( Null )  [ X k ] ≠  [ Y k ] ( Alternative ) Var [ X k ] ≠ Var [ Y k ] ( Alternative ) How to test the hypothesis whether { X k } and { Y k } are identi- cally distributed? Generalized Greenwood Statistics

  15. Motivation Framework Application Sampling uniformly from the k -dimensional simplex ∆ k − 1 Generalized Greenwood Statistics

  16. Motivation Framework Application Balls and bins Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of � n · ∆ k − 1 ∩ Z + � ◮ S n , k ∼ U L 1 and L 2 balls (Bose-Einstein- ◮ Up to k = 3 (Gardner Distribution) ’52) ◮ Large deviations (Schechtner, Zinn ’00) ◮ Tabulation of z -scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81) Generalized Greenwood Statistics

  17. Motivation Framework Application Balls and bins Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of � n · ∆ k − 1 ∩ Z + � ◮ S n , k ∼ U L 1 and L 2 balls (Bose-Einstein- ◮ Up to k = 3 (Gardner Distribution) ’52) ◮ Large deviations (Schechtner, Zinn ’00) ◮ Tabulation of z -scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81) Generalized Greenwood Statistics

  18. Motivation Framework Application Balls and bins Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of � n · ∆ k − 1 ∩ Z + � ◮ S n , k ∼ U L 1 and L 2 balls (Bose-Einstein- ◮ Up to k = 3 (Gardner Distribution) ’52) ◮ Large deviations (Schechtner, Zinn ’00) ◮ Tabulation of z -scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81) Generalized Greenwood Statistics

  19. Motivation Framework Application Balls and bins k S n , k Limit as n → ∞ for fixed k 1 2 S n , k S n , k ... ... ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of � n · ∆ k − 1 ∩ Z + � ◮ S n , k ∼ U L 1 and L 2 balls (Bose-Einstein- ◮ Up to k = 3 (Gardner Distribution) ’52) ◮ Large deviations (Schechtner, Zinn ’00) ◮ Tabulation of z -scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81) Generalized Greenwood Statistics

  20. Motivation Framework Application Balls and bins k S n , k Limit as n → ∞ for fixed k 1 2 S n , k S n , k ... ... ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of � n · ∆ k − 1 ∩ Z + � ◮ S n , k ∼ U L 1 and L 2 balls (Bose-Einstein- ◮ Up to k = 3 (Gardner Distribution) ’52) ◮ Large deviations (Schechtner, Zinn ’00) ◮ Tabulation of z -scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81) Generalized Greenwood Statistics

  21. Motivation Framework Application Balls and bins k S n , k Limit as n → ∞ for fixed k 1 2 S n , k S n , k ... ... ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of � n · ∆ k − 1 ∩ Z + � ◮ S n , k ∼ U L 1 and L 2 balls (Bose-Einstein- ◮ Up to k = 3 (Gardner Distribution) ’52) ◮ Large deviations ◮ Can we perform hypothesis (Schechtner, Zinn ’00) testing based on � S n , k � 2 2 ? ◮ Tabulation of z -scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81) Generalized Greenwood Statistics

  22. Motivation Framework Application Balls and bins k S n , k Limit as n → ∞ for fixed k 1 2 S n , k S n , k ... ... ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of � n · ∆ k − 1 ∩ Z + � ◮ S n , k ∼ U L 1 and L 2 balls (Bose-Einstein- ◮ Up to k = 3 (Gardner Distribution) ’52) ◮ Large deviations ◮ Can we perform hypothesis (Schechtner, Zinn ’00) testing based on � S n , k � 2 2 ? ◮ Tabulation of z -scores up What is the distribution of to k = 20 (Burrows ’79, � S n , k � 2 2 ? Currie ’81, Stephens ’81) Generalized Greenwood Statistics

Recommend


More recommend