learning statistical property testers
play

Learning Statistical Property Testers Sreeram Kannan University of - PowerPoint PPT Presentation

Learning Statistical Property Testers Sreeram Kannan University of Washington Seattle Collaborators Rajat Karthikeyan Sudipto Himanshu Arman Sen Shanmugan Mukherjee Asnani Rahimzamani UT, Austin IBM Research University of


  1. Learning Statistical Property Testers Sreeram Kannan University of Washington Seattle

  2. Collaborators Rajat Karthikeyan Sudipto 
 Himanshu Arman Sen Shanmugan Mukherjee Asnani Rahimzamani UT, Austin IBM Research University of Washington, Seattle

  3. Statistical Property Testing ✤ Closeness testing ✤ Independence testing ✤ Conditional Independence testing ✤ Information estimation

  4. Testing Total Variation Distance 100 100 75 75 50 50 25 25 Q P

  5. Testing Total Variation Distance 100 100 75 75 50 50 25 25 Q P n samples n samples

  6. Testing Total Variation Distance 100 100 75 75 50 50 25 25 Q P n samples n samples Estimate D T V ( P, Q ) ?

  7. Testing Total Variation Distance 100 100 75 75 50 50 25 25 Q P n samples n samples Estimate D T V ( P, Q ) ? P and Q can be arbitrary. Search beyond Traditional Density Estimation Methods

  8. Testing Total Variation: Prior Art ✤ Lots of work in CS theory on D TV testing ✤ Based on closeness testing between P and Q ✤ Sample complexity = O(n a ), where n = alphabet size ✤ Curse of dimensionality if n = 2 d Complexity is O(2 ad ) * Chan et al, Optimal Algorithms for testing * Sriperumbudur et al, Kernel choice and classifiability for closeness of discrete distributions , SODA 2014 RKHS embeddings of probability distributions, NIPS 2009

  9. Classifiers beat curse-of-dimensionality ✤ Deep NN and boosted random forests achieve state-of-the-art performance ✤ Works very well even in practice when X is high dimensional. ✤ Exploits generic inductive bias: ✤ Invariance ✤ Hierarchical Structure ✤ Symmetry Theoretical guarantees lag severely behind practice!

  10. Distance Estimation via Classification 100 100 75 75 50 50 25 25 n samples ∼ Q n samples ∼ P

  11. Distance Estimation via Classification 100 100 75 75 50 50 25 25 n samples ∼ Q n samples ∼ P (Label 1) (Label 0) Classifier

  12. Distance Estimation via Classification 100 100 75 75 50 50 25 25 n samples ∼ Q n samples ∼ P (Label 1) (Label 0) Classifier Classification Error 1 2 − 1 of Optimal Bayes = 2 D TV ( P, Q ). Classifier

  13. Distance Estimation via Classification 100 100 75 75 50 50 25 25 n samples ∼ Q n samples ∼ P (Label 1) (Label 0) Deep NN, Boosted Trees etc. Classification Error of 1 2 − 1 = 2 D TV ( P, Q ). Optimal Classifier * Sriperumbudur et al, Kernel choice and classifiability for * Lopez-Paz et al, Revisiting Classifier two-sample RKHS embeddings of probability distributions, NIPS 2009 tests , ICLR 2017

  14. Distance Estimation via Classification 100 100 75 75 50 50 25 25 n samples ∼ Q n samples ∼ P (Label 1) (Label 0) Deep NN, Boosted Trees etc. Can get P-value control Classification Error of 1 2 − 1 >= 2 D TV ( P, Q ). Any Classifier * Sriperumbudur et al, Kernel choice and classifiability for * Lopez-Paz et al, Revisiting Classifier two-sample RKHS embeddings of probability distributions, NIPS 2009 tests , ICLR 2017

  15. Independence Testing n samples { x i , y i } n i =1 * Sriperumbudur et al, Kernel choice and classifiability for * Lopez-Paz et al, Revisiting Classifier two-sample RKHS embeddings of probability distributions, NIPS 2009 tests , ICLR 2017

  16. Independence Testing n H 0 : X || Y ( P CI ) n samples { x i , y i } n i =1 H 1 : X 6? ? Y ( P )

  17. Independence Testing n H 0 : X || Y ( P CI ) n samples { x i , y i } n i =1 H 1 : X 6? ? Y ( P ) P ( p ( x, y )) P CI ( p ( x ) p ( y )) Classify

  18. Independence Testing n H 0 : X || Y ( P CI ) n samples { x i , y i } n i =1 H 1 : X 6? ? Y ( P ) P ( p ( x, y )) P CI ( p ( x ) p ( y )) P CI ( p ( x ) p ( y )) Classify

  19. Independence Testing n H 0 : X || Y ( P CI ) n samples { x i , y i } n i =1 H 1 : X 6? ? Y ( P ) P ( p ( x, y )) P CI ( p ( x ) p ( y )) P CI ( p ( x ) p ( y )) Permutation Classify

  20. Independence Testing n samples { x i , y i } n i =1 Split Equally

  21. Independence Testing n samples { x i , y i } n i =1 Split Equally P ( p ( x, y ))

  22. Independence Testing n samples { x i , y i } n i =1 Split Equally Label 0 P ( p ( x, y ))

  23. Independence Testing n samples { x i , y i } n i =1 Split Equally Label 0 y i ’s are permuted P ( p ( x, y ))

  24. Independence Testing n samples { x i , y i } n i =1 Split Equally Label 0 y i ’s are permuted P ( p ( x, y )) P CI ( p ( x ) p ( y ))

  25. Independence Testing n samples { x i , y i } n i =1 Split Equally Label 0 y i ’s are permuted P ( p ( x, y )) P CI ( p ( x ) p ( y )) Label 1

  26. Independence Testing n samples { x i , y i } n i =1 Split Equally Label 0 y i ’s are permuted P ( p ( x, y )) P CI ( p ( x ) p ( y )) P-value control Label 1 *Lopez-Paz et al, Revisiting Classifier two-sample * Sriperumbudur et al, Kernel choice and classifiability for tests , ICLR 2017 RKHS embeddings of probability distributions, NIPS 2009

  27. Conditional Independence Testing n H 0 : X || Y | Z ( P CI ) n samples { x i , y i , z i } n vs i =1 H 1 : X 6? ? Y | Z ( P )

  28. Conditional Independence Testing n H 0 : X || Y | Z ( P CI ) n samples { x i , y i , z i } n vs i =1 H 1 : X 6? ? Y | Z ( P ) P ( p ( x, y, z )) P CI ( p ( z ) p ( x | z ) p ( y | z )) Classify

  29. Conditional Independence Testing n H 0 : X || Y | Z ( P CI ) n samples { x i , y i , z i } n vs i =1 H 1 : X 6? ? Y | Z ( P ) P ( p ( x, y, z )) How to get P CI ( p ( z ) p ( x | z ) p ( y | z )? P CI ( p ( z ) p ( x | z ) p ( y | z )) Classify

  30. Conditional Independence Testing n H 0 : X || Y | Z ( P CI ) n samples { x i , y i , z i } n vs i =1 H 1 : X 6? ? Y | Z ( P ) P ( p ( x, y, z )) Given samples ∼ p ( x, z ) How to emulate p ( y | z )? P CI ( p ( z ) p ( x | z ) p ( y | z )) Classify

  31. Conditional Independence Testing n H 0 : X || Y | Z ( P CI ) n samples { x i , y i , z i } n vs i =1 H 1 : X 6? ? Y | Z ( P ) P ( p ( x, y, z )) Emulate p ( y | z ) as q ( y | z ) ✤ KNN Based Methods P CI ( p ( z ) p ( x | z ) p ( y | z )) ✤ Kernel Methods Classify

  32. Conditional Independence Testing n H 0 : X || Y | Z ( P CI ) n samples { x i , y i , z i } n vs i =1 H 1 : X 6? ? Y | Z ( P ) P ( p ( x, y, z )) Emulate p ( y | z ) as q ( y | z ) ✤ KNN Based ˜ P CI ( p ( z ) p ( x | z ) q ( y | z )) Methods ˜ P CI ( p ( z ) p ( x | z ) q ( y | z )) ✤ Kernel Methods Classify

  33. Conditional Independence Testing n H 0 : X || Y | Z ( P CI ) n samples { x i , y i , z i } n vs i =1 H 1 : X 6? ? Y | Z ( P ) P ( p ( x, y, z )) Emulate p ( y | z ) as q ( y | z ) ✤ [KCIT] Gretton et al, Kernel-based conditional independence test and application in causal discovery, NIPS 2008 ✤ KNN Based ✤ [KCIPT] Doran et al, A permutation-based kernel conditional ˜ P CI ( p ( z ) p ( x | z ) q ( y | z )) Methods independence test, UAI 2014 ˜ P CI ( p ( z ) p ( x | z ) q ( y | z )) ✤ [CCIT] Sen et al, Model-Powered Conditional Independence Test , NIPS ✤ Kernel 2017 ✤ [RCIT] Strobl et al, Approximate Kernel-based Conditional Independence Methods Tests for Fast Non-Parametric Causal Discovery, arXiv 
 Classify

  34. Conditional Independence Testing n H 0 : X || Y | Z ( P CI ) n samples { x i , y i , z i } n vs i =1 H 1 : X 6? ? Y | Z ( P ) P ( p ( x, y, z )) Emulate p ( y | z ) as q ( y | z ) ✤ Limited to low-dimensional Z. ✤ KNN Based ˜ P CI ( p ( z ) p ( x | z ) q ( y | z )) Methods In practice, Z is often high dimensional. ˜ P CI ( p ( z ) p ( x | z ) q ( y | z )) ✤ Kernel (Eg. In graphical model, conditioning set can be Methods entire graph.) Classify

  35. Generative Models beat curse-of-dimensionality z x Generator Low-dimensional High-dimensional Latent Space data Space

  36. Generative Models beat curse-of-dimensionality z x Generator Low-dimensional High-dimensional Latent Space data Space ✤ Trained Real Samples of x ✤ Can generate any number of new samples

  37. Generative Models beat curse-of-dimensionality z x Generator Low-dimensional High-dimensional Latent Space data Space ✤ Trained Real Samples of x ✤ Can generate any number of new samples

  38. P CI or q ( y | z )? How loose can the estimate be for ˜

  39. P CI or q ( y | z )? How loose can the estimate be for ˜ As long as the density function q ( y | z ) > 0 whenever p ( y , z ) > 0. Mimic-and-Classify works

  40. P CI or q ( y | z )? How loose can the estimate be for ˜ Novel Bias Cancellation Method in Mimic-and-Classify works As long as the density function q ( y | z ) > 0 whenever p ( y , z ) > 0. Mimic Functions : GANs, Regressors etc.

  41. Mimic and Classify Mimic Step Classify Step

  42. Mimic and Classify 100 Mimic 50 Step D ∼ p ( x, y, z ) Classify Step

  43. Mimic and Classify 100 100 Mimic 50 50 Step D ∼ p ( x, y, z ) D 2 ∼ p ( x, y, z ) 100 50 D 1 ∼ p ( x, y, z ) Classify Step

  44. Mimic and Classify 100 100 Mimic 50 50 Step D ∼ p ( x, y, z ) D 2 ∼ p ( x, y, z ) Dataset D 2 (x i , y i , z i ) z i y’ i (x i , y’ i , z i ) MIMIC 100 Dataset D’ 50 D 1 ∼ p ( x, y, z ) Classify Step

  45. Mimic and Classify 100 100 Mimic 50 50 Step D ∼ p ( x, y, z ) D 2 ∼ p ( x, y, z ) MIMIC 100 100 50 50 D 1 ∼ p ( x, y, z ) D 0 ∼ p ( z ) p ( x | z ) q ( y | z ) Classify Step

Recommend


More recommend