Learning Statistical Property Testers Sreeram Kannan University of - PowerPoint PPT Presentation

Learning Statistical Property Testers Sreeram Kannan University of Washington Seattle

Collaborators Rajat Karthikeyan Sudipto   Himanshu Arman Sen Shanmugan Mukherjee Asnani Rahimzamani UT, Austin IBM Research University of Washington, Seattle

Statistical Property Testing ✤ Closeness testing ✤ Independence testing ✤ Conditional Independence testing ✤ Information estimation

Testing Total Variation Distance 100 100 75 75 50 50 25 25 Q P

Testing Total Variation Distance 100 100 75 75 50 50 25 25 Q P n samples n samples

Testing Total Variation Distance 100 100 75 75 50 50 25 25 Q P n samples n samples Estimate D T V ( P, Q ) ?

Testing Total Variation Distance 100 100 75 75 50 50 25 25 Q P n samples n samples Estimate D T V ( P, Q ) ? P and Q can be arbitrary. Search beyond Traditional Density Estimation Methods

Testing Total Variation: Prior Art ✤ Lots of work in CS theory on D TV testing ✤ Based on closeness testing between P and Q ✤ Sample complexity = O(n a ), where n = alphabet size ✤ Curse of dimensionality if n = 2 d Complexity is O(2 ad ) * Chan et al, Optimal Algorithms for testing * Sriperumbudur et al, Kernel choice and classifiability for closeness of discrete distributions , SODA 2014 RKHS embeddings of probability distributions, NIPS 2009

Classifiers beat curse-of-dimensionality ✤ Deep NN and boosted random forests achieve state-of-the-art performance ✤ Works very well even in practice when X is high dimensional. ✤ Exploits generic inductive bias: ✤ Invariance ✤ Hierarchical Structure ✤ Symmetry Theoretical guarantees lag severely behind practice!

Distance Estimation via Classification 100 100 75 75 50 50 25 25 n samples ∼ Q n samples ∼ P

Distance Estimation via Classification 100 100 75 75 50 50 25 25 n samples ∼ Q n samples ∼ P (Label 1) (Label 0) Classifier

Distance Estimation via Classification 100 100 75 75 50 50 25 25 n samples ∼ Q n samples ∼ P (Label 1) (Label 0) Classifier Classification Error 1 2 − 1 of Optimal Bayes = 2 D TV ( P, Q ). Classifier

Distance Estimation via Classification 100 100 75 75 50 50 25 25 n samples ∼ Q n samples ∼ P (Label 1) (Label 0) Deep NN, Boosted Trees etc. Classification Error of 1 2 − 1 = 2 D TV ( P, Q ). Optimal Classifier * Sriperumbudur et al, Kernel choice and classifiability for * Lopez-Paz et al, Revisiting Classifier two-sample RKHS embeddings of probability distributions, NIPS 2009 tests , ICLR 2017

Distance Estimation via Classification 100 100 75 75 50 50 25 25 n samples ∼ Q n samples ∼ P (Label 1) (Label 0) Deep NN, Boosted Trees etc. Can get P-value control Classification Error of 1 2 − 1 >= 2 D TV ( P, Q ). Any Classifier * Sriperumbudur et al, Kernel choice and classifiability for * Lopez-Paz et al, Revisiting Classifier two-sample RKHS embeddings of probability distributions, NIPS 2009 tests , ICLR 2017

Independence Testing n samples { x i , y i } n i =1 * Sriperumbudur et al, Kernel choice and classifiability for * Lopez-Paz et al, Revisiting Classifier two-sample RKHS embeddings of probability distributions, NIPS 2009 tests , ICLR 2017

Independence Testing n H 0 : X || Y ( P CI ) n samples { x i , y i } n i =1 H 1 : X 6? ? Y ( P )

Independence Testing n H 0 : X || Y ( P CI ) n samples { x i , y i } n i =1 H 1 : X 6? ? Y ( P ) P ( p ( x, y )) P CI ( p ( x ) p ( y )) Classify

Independence Testing n H 0 : X || Y ( P CI ) n samples { x i , y i } n i =1 H 1 : X 6? ? Y ( P ) P ( p ( x, y )) P CI ( p ( x ) p ( y )) P CI ( p ( x ) p ( y )) Classify

Independence Testing n H 0 : X || Y ( P CI ) n samples { x i , y i } n i =1 H 1 : X 6? ? Y ( P ) P ( p ( x, y )) P CI ( p ( x ) p ( y )) P CI ( p ( x ) p ( y )) Permutation Classify

Independence Testing n samples { x i , y i } n i =1 Split Equally

Independence Testing n samples { x i , y i } n i =1 Split Equally P ( p ( x, y ))

Independence Testing n samples { x i , y i } n i =1 Split Equally Label 0 P ( p ( x, y ))

Independence Testing n samples { x i , y i } n i =1 Split Equally Label 0 y i ’s are permuted P ( p ( x, y ))

Independence Testing n samples { x i , y i } n i =1 Split Equally Label 0 y i ’s are permuted P ( p ( x, y )) P CI ( p ( x ) p ( y ))

Independence Testing n samples { x i , y i } n i =1 Split Equally Label 0 y i ’s are permuted P ( p ( x, y )) P CI ( p ( x ) p ( y )) Label 1

Independence Testing n samples { x i , y i } n i =1 Split Equally Label 0 y i ’s are permuted P ( p ( x, y )) P CI ( p ( x ) p ( y )) P-value control Label 1 *Lopez-Paz et al, Revisiting Classifier two-sample * Sriperumbudur et al, Kernel choice and classifiability for tests , ICLR 2017 RKHS embeddings of probability distributions, NIPS 2009

Conditional Independence Testing n H 0 : X || Y | Z ( P CI ) n samples { x i , y i , z i } n vs i =1 H 1 : X 6? ? Y | Z ( P )

Conditional Independence Testing n H 0 : X || Y | Z ( P CI ) n samples { x i , y i , z i } n vs i =1 H 1 : X 6? ? Y | Z ( P ) P ( p ( x, y, z )) P CI ( p ( z ) p ( x | z ) p ( y | z )) Classify

Conditional Independence Testing n H 0 : X || Y | Z ( P CI ) n samples { x i , y i , z i } n vs i =1 H 1 : X 6? ? Y | Z ( P ) P ( p ( x, y, z )) How to get P CI ( p ( z ) p ( x | z ) p ( y | z )? P CI ( p ( z ) p ( x | z ) p ( y | z )) Classify

Conditional Independence Testing n H 0 : X || Y | Z ( P CI ) n samples { x i , y i , z i } n vs i =1 H 1 : X 6? ? Y | Z ( P ) P ( p ( x, y, z )) Given samples ∼ p ( x, z ) How to emulate p ( y | z )? P CI ( p ( z ) p ( x | z ) p ( y | z )) Classify

Conditional Independence Testing n H 0 : X || Y | Z ( P CI ) n samples { x i , y i , z i } n vs i =1 H 1 : X 6? ? Y | Z ( P ) P ( p ( x, y, z )) Emulate p ( y | z ) as q ( y | z ) ✤ KNN Based Methods P CI ( p ( z ) p ( x | z ) p ( y | z )) ✤ Kernel Methods Classify

Conditional Independence Testing n H 0 : X || Y | Z ( P CI ) n samples { x i , y i , z i } n vs i =1 H 1 : X 6? ? Y | Z ( P ) P ( p ( x, y, z )) Emulate p ( y | z ) as q ( y | z ) ✤ KNN Based ˜ P CI ( p ( z ) p ( x | z ) q ( y | z )) Methods ˜ P CI ( p ( z ) p ( x | z ) q ( y | z )) ✤ Kernel Methods Classify

Conditional Independence Testing n H 0 : X || Y | Z ( P CI ) n samples { x i , y i , z i } n vs i =1 H 1 : X 6? ? Y | Z ( P ) P ( p ( x, y, z )) Emulate p ( y | z ) as q ( y | z ) ✤ [KCIT] Gretton et al, Kernel-based conditional independence test and application in causal discovery, NIPS 2008 ✤ KNN Based ✤ [KCIPT] Doran et al, A permutation-based kernel conditional ˜ P CI ( p ( z ) p ( x | z ) q ( y | z )) Methods independence test, UAI 2014 ˜ P CI ( p ( z ) p ( x | z ) q ( y | z )) ✤ [CCIT] Sen et al, Model-Powered Conditional Independence Test , NIPS ✤ Kernel 2017 ✤ [RCIT] Strobl et al, Approximate Kernel-based Conditional Independence Methods Tests for Fast Non-Parametric Causal Discovery, arXiv   Classify

Conditional Independence Testing n H 0 : X || Y | Z ( P CI ) n samples { x i , y i , z i } n vs i =1 H 1 : X 6? ? Y | Z ( P ) P ( p ( x, y, z )) Emulate p ( y | z ) as q ( y | z ) ✤ Limited to low-dimensional Z. ✤ KNN Based ˜ P CI ( p ( z ) p ( x | z ) q ( y | z )) Methods In practice, Z is often high dimensional. ˜ P CI ( p ( z ) p ( x | z ) q ( y | z )) ✤ Kernel (Eg. In graphical model, conditioning set can be Methods entire graph.) Classify

Generative Models beat curse-of-dimensionality z x Generator Low-dimensional High-dimensional Latent Space data Space

Generative Models beat curse-of-dimensionality z x Generator Low-dimensional High-dimensional Latent Space data Space ✤ Trained Real Samples of x ✤ Can generate any number of new samples

P CI or q ( y | z )? How loose can the estimate be for ˜

P CI or q ( y | z )? How loose can the estimate be for ˜ As long as the density function q ( y | z ) > 0 whenever p ( y , z ) > 0. Mimic-and-Classify works

P CI or q ( y | z )? How loose can the estimate be for ˜ Novel Bias Cancellation Method in Mimic-and-Classify works As long as the density function q ( y | z ) > 0 whenever p ( y , z ) > 0. Mimic Functions : GANs, Regressors etc.

Mimic and Classify Mimic Step Classify Step

Mimic and Classify 100 Mimic 50 Step D ∼ p ( x, y, z ) Classify Step

Mimic and Classify 100 100 Mimic 50 50 Step D ∼ p ( x, y, z ) D 2 ∼ p ( x, y, z ) 100 50 D 1 ∼ p ( x, y, z ) Classify Step

Mimic and Classify 100 100 Mimic 50 50 Step D ∼ p ( x, y, z ) D 2 ∼ p ( x, y, z ) Dataset D 2 (x i , y i , z i ) z i y’ i (x i , y’ i , z i ) MIMIC 100 Dataset D’ 50 D 1 ∼ p ( x, y, z ) Classify Step

Mimic and Classify 100 100 Mimic 50 50 Step D ∼ p ( x, y, z ) D 2 ∼ p ( x, y, z ) MIMIC 100 100 50 50 D 1 ∼ p ( x, y, z ) D 0 ∼ p ( z ) p ( x | z ) q ( y | z ) Classify Step

Learning Statistical Property Testers Sreeram Kannan University of - PowerPoint PPT Presentation

Learning Statistical Property Testers Sreeram Kannan University of Washington Seattle Collaborators Rajat Karthikeyan Sudipto Himanshu Arman Sen Shanmugan Mukherjee Asnani Rahimzamani UT, Austin IBM Research University of

Testing to 10kV with the new 1555/1550C Insulation Testers Any Motor. Any Voltage. Any

Peering Into the White Box: A testers approach to code reviews Alan Page M icrosoft

What is the cloud? Property of TalentWise Property of TalentWise Cloud HCM Players Property of

PROPERTY RATES PROPERTY RATES PROPERTY RATES PROPERTY RATES BUFFALO CITY MUNICIPALITY

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Treasurers Institute Sun, Nov. 17, 2019 Property Tax Errors Property Tax Errors Property Tax

PT Mega Manunggal Property Tbk 1 PT Mega Manunggal Property Tbk 2 PT Mega Manunggal Property

PT Mega Manunggal Property Tbk 1 PT Mega Manunggal Property Tbk 2 PT Mega Manunggal Property

AI and Machine Learning for Testers Jason Arbon, CEO @Appdiff Relevant Context Testing Neural

Day 1: Introduction to Statistical Learning Lucas Leemann Essex Summer School Introduction to

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

What is Unclaimed Property? Unclaimed property is any financial asset or tangible property* that

2010 DEXUS Property Group US Property Tour DEXUS Funds Management Limited ABN 24 060 920 783

PT Mega Manunggal Property Tbk PT Mega Manunggal Property Tbk PT Mega Manunggal Property Tbk

JUST THE MATHS SLIDES NUMBER 15.5 ORDINARY DIFFERENTIAL EQUATIONS 5 (Second order

RC circuits with DC sources A Circuit i (resistors, voltage sources, v C current sources,

Lecture 5.1: Fouriers law and the diffusion equation Matthew Macauley Department of

3.2 Classic Differential Geometry 2 Hao Li http://cs599.hao-li.com 1 Outline Parametric

A Study of Evidence-Based Policy Making and Data Visualizations The Joint Conference of the

A Schumpeterian Model of Top Income Inequality Chad Jones and Jihee Kim Forthcoming, Journal of

QEMU Support for the RISC-V Instruction Set Architecture Sagar Karandikar

V ud Outlook : Experiment & Theory M.J. Ramsey-Musolf U Mass Amherst

Learning Statistical Property Testers Sreeram Kannan University of - PowerPoint PPT Presentation

Learning Statistical Property Testers Sreeram Kannan University of Washington Seattle Collaborators Rajat Karthikeyan Sudipto Himanshu Arman Sen Shanmugan Mukherjee Asnani Rahimzamani UT, Austin IBM Research University of

Testing to 10kV with the new 1555/1550C Insulation Testers Any Motor. Any Voltage. Any

Peering Into the White Box: A testers approach to code reviews Alan Page M icrosoft

What is the cloud? Property of TalentWise Property of TalentWise Cloud HCM Players Property of

PROPERTY RATES PROPERTY RATES PROPERTY RATES PROPERTY RATES BUFFALO CITY MUNICIPALITY

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Treasurers Institute Sun, Nov. 17, 2019 Property Tax Errors Property Tax Errors Property Tax

PT Mega Manunggal Property Tbk 1 PT Mega Manunggal Property Tbk 2 PT Mega Manunggal Property

PT Mega Manunggal Property Tbk 1 PT Mega Manunggal Property Tbk 2 PT Mega Manunggal Property

AI and Machine Learning for Testers Jason Arbon, CEO @Appdiff Relevant Context Testing Neural

Day 1: Introduction to Statistical Learning Lucas Leemann Essex Summer School Introduction to

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

What is Unclaimed Property? Unclaimed property is any financial asset or tangible property* that

2010 DEXUS Property Group US Property Tour DEXUS Funds Management Limited ABN 24 060 920 783

PT Mega Manunggal Property Tbk PT Mega Manunggal Property Tbk PT Mega Manunggal Property Tbk

JUST THE MATHS SLIDES NUMBER 15.5 ORDINARY DIFFERENTIAL EQUATIONS 5 (Second order

RC circuits with DC sources A Circuit i (resistors, voltage sources, v C current sources,

Lecture 5.1: Fouriers law and the diffusion equation Matthew Macauley Department of

3.2 Classic Differential Geometry 2 Hao Li http://cs599.hao-li.com 1 Outline Parametric

A Study of Evidence-Based Policy Making and Data Visualizations The Joint Conference of the

A Schumpeterian Model of Top Income Inequality Chad Jones and Jihee Kim Forthcoming, Journal of

QEMU Support for the RISC-V Instruction Set Architecture Sagar Karandikar

V ud Outlook : Experiment &amp; Theory M.J. Ramsey-Musolf U Mass Amherst

V ud Outlook : Experiment & Theory M.J. Ramsey-Musolf U Mass Amherst