kernel methods for hypothesis testing and inference
play

Kernel methods for hypothesis testing and inference MLSS T - PowerPoint PPT Presentation

Kernel methods for hypothesis testing and inference MLSS T ubingen, 2015 Arthur Gretton Gatsby Unit, CSML, UCL Some motivating questions... Detecting differences in brain signals The problem: Do local field potential (LFP) signals change


  1. Kernel methods for hypothesis testing and inference MLSS T¨ ubingen, 2015 Arthur Gretton Gatsby Unit, CSML, UCL

  2. Some motivating questions...

  3. Detecting differences in brain signals The problem: Do local field potential (LFP) signals change when measured near a spike burst? LFP near spike burst LFP without spike burst 0.3 0.3 0.2 0.2 0.1 0.1 LFP amplitude LFP amplitude 0 0 −0.1 −0.1 −0.2 −0.2 −0.3 −0.3 −0.4 −0.4 0 20 40 60 80 100 0 20 40 60 80 100 Time Time

  4. Detecting differences in brain signals The problem: Do local field potential (LFP) signals change when measured near a spike burst?

  5. Detecting differences in brain signals The problem: Do local field potential (LFP) signals change when measured near a spike burst?

  6. Detecting differences in amplitude modulated signals Samples from P Samples from Q

  7. Adversarial training of deep neural networks From ICML 2015: Generative Moment Matching Networks Yujia Li 1 YUJIALI @ CS . TORONTO . EDU Kevin Swersky 1 KSWERSKY @ CS . TORONTO . EDU Richard Zemel 1 , 2 ZEMEL @ CS . TORONTO . EDU 1 Department of Computer Science, University of Toronto, Toronto, ON, CANADA 2 Canadian Institute for Advanced Research, Toronto, ON, CANADA arXiv:1502.02761v1 [cs.LG] 10 Feb 2015 From UAI 2015: Training generative neural networks via Maximum Mean Discrepancy optimization Gintare Karolina Dziugaite Daniel M. Roy Zoubin Ghahramani University of Cambridge University of Toronto University of Cambridge Idea: In adversarial nets (Goodfellow et al. NIPS 2014), replace discriminator network with maximum mean discrepancy , a kernel distance between distributions.

  8. Case of discrete domains • How do you compare distributions . . . • . . . in a discrete domain? [Read and Cressie, 1988]

  9. Case of discrete domains • How do you compare distributions . . . • . . . in a discrete domain? [Read and Cressie, 1988] Y 1 : X 1 : Honourable senators, I have a question for Now disturbing reports out of Newfound- the Leader of the Government in the Senate with land show that the fragile snow crab industry is regard to the support funding to farmers that has in serious decline. First the west coast salmon, been announced. Most farmers have not received the east coast salmon and the cod, and now the any money yet. snow crabs off Newfoundland. X 2 : To my pleasant surprise he responded that Y 2 : On the grain transportation system we have he had personally visited those wharves and that had the Estey report and the Kroeger report. he had already announced money to fix them. We could go on and on. Recently programs have ? What wharves did the minister visit in my riding been announced over and over by the government P X = P Y and how much additional funding is he going to such as money for the disaster in agriculture on provide for Delaps Cove, Hampton, Port Lorne, the prairies and across Canada. · · · · · · Are the pink extracts from the same distribution as the gray ones?

  10. Detecting statistical dependence, continuous domain • How do you detect dependence . . . • . . . in a continuous domain? Dependent P XY 1.5 1 0.5 Sample from P XY 0 1.5 ր −0.5 1 −1 0.5 −1.5 −1.5 −1 −0.5 0 0.5 1 1.5 Y 0 ? Independent P XY =P X P Y −0.5 1.5 −1 1 ց 0.5 −1.5 −1.5 −1 −0.5 0 0.5 1 1.5 0 X −0.5 −1 −1.5 −1.5 −1 −0.5 0 0.5 1 1.5

  11. Detecting statistical dependence, continuous domain • How do you detect dependence . . . Discretized empirical P XY • . . . in a continuous domain? Sample from P XY 1.5 ր 1 0.5 Y ? 0 Discretized empirical P X P Y −0.5 −1 ց −1.5 −1.5 −1 −0.5 0 0.5 1 1.5 X

  12. Detecting statistical dependence, continuous domain • How do you detect dependence . . . Discretized empirical P XY • . . . in a continuous domain? Sample from P XY 1.5 ր 1 0.5 Y ? 0 Discretized empirical P X P Y −0.5 −1 ց −1.5 −1.5 −1 −0.5 0 0.5 1 1.5 X

  13. Detecting statistical dependence, continuous domain • How do you detect dependence . . . • . . . in a continuous domain? • Problem: fails even in “low” dimensions! [NIPS07a, ALT08] – X and Y in R 4 , statistic=Power divergence, samples= 1024, cases where dependence detected=0 / 500 • Too few points per bin

  14. Detecting statistical dependence, discrete domain • How do you detect dependence . . . • . . . in a discrete domain? [Read and Cressie, 1988] Y 1 : Honorables s´ enateurs, ma question X 1 : Honourable senators, I have a ques- s’adresse au leader du gouvernement au tion for the Leader of the Government in the S´ enat et concerne l’aide financi´ ere qu’on a Senate with regard to the support funding annonc´ ee pour les agriculteurs. La plupart to farmers that has been announced. Most des agriculteurs n’ont encore rien reu de cet farmers have not received any money yet. argent. Y 2 : Il est evident ´ que les ordres de X 2 : No doubt there is great pressure on gouvernements provinciaux et municipaux provincial and municipal governments in re- subissent de fortes pressions en ce qui con- lation to the issue of child care, but the re- cerne les services de garde, mais le gou- ality is that there have been no cuts to child ? vernement n’a pas r´ eduit le financement care funding from the federal government to P XY = P X P Y qu’il verse aux provinces pour les services de the provinces. In fact, we have increased garde. Au contraire, nous avons augment´ e le federal investments for early childhood de- financement f´ ed´ eral pour le d´ eveloppement velopment. des jeunes enfants. · · · · · · Are the French text extracts translations of the English ones?

  15. Detecting a higher order interaction • How to detect V-structures with pairwise weak (or nonexistent) dependence? Y X Z

  16. Detecting a higher order interaction • How to detect V-structures with pairwise weak (or nonexistent) dependence?

  17. Detecting a higher order interaction • How to detect V-structures with pairwise weak (or nonexistent) dependence? • X ⊥ ⊥ Y , Y ⊥ ⊥ Z , X ⊥ ⊥ Z Y X X vs Y Y vs Z Z X vs Z XY vs Z • X, Y i.i.d. ∼ N (0 , 1), • Z | X, Y ∼ sign( XY ) Exp ( 1 2 ) √ Faithfulness violated here

  18. V-structure Discovery Y X Z Assume X ⊥ ⊥ Y has been established. V-structure can then be detected by: • CI test: H 0 : X ⊥ ⊥ Y | Z (Zhang et al 2011) or

  19. V-structure Discovery Y X Z Assume X ⊥ ⊥ Y has been established. V-structure can then be detected by: • CI test: H 0 : X ⊥ ⊥ Y | Z (Zhang et al 2011) or • Factorisation test: H 0 : ( X, Y ) ⊥ ⊥ Z ∨ ( X, Z ) ⊥ ⊥ Y ∨ ( Y, Z ) ⊥ ⊥ X (multiple two-variable independence tests) – compute p -values for each of the marginal tests for ( Y, Z ) ⊥ ⊥ X , ( X, Z ) ⊥ ⊥ Y , or ( X, Y ) ⊥ ⊥ Z – apply Holm-Bonferroni ( HB ) sequentially rejective correction (Holm 1979)

  20. V-structure Discovery (2) • How to detect V-structures with pairwise weak (or nonexistent) dependence? • X ⊥ ⊥ Y , Y ⊥ ⊥ Z , X ⊥ ⊥ Z Y X X1 vs Y1 Y1 vs Z1 Z i.i.d. • X 1 , Y 1 ∼ N (0 , 1), X1 vs Z1 X1*Y1 vs Z1 • Z 1 | X 1 , Y 1 ∼ sign( X 1 Y 1 ) Exp ( 1 2 ) √ i.i.d. • X 2: p , Y 2: p , Z 2: p ∼ N (0 , I p − 1 ) Faithfulness violated here

  21. V-structure Discovery (3) V-structure discovery: Dataset A 1 Null acceptance rate (Type II error) 0 . 8 0 . 6 0 . 4 2var: Factor 0 . 2 ⊥ Y | Z CI: X ⊥ 0 1 3 5 7 9 11 13 15 17 19 Dimension Figure 1: CI test for X ⊥ ⊥ Y | Z from Zhang et al (2011) , and a factorisation test with a HB correction, n = 500

  22. Outline • Intro to reproducing kernel Hilbert spaces (RKHS) • An RKHS metric on the space of probability measures – Distance between means in space of features (RKHS) – Characteristic kernels: feature space mappings of probabilities unique – Nonparametric two-sample test • Dependence detection – Covariance in feature space and test • Relation with energy distance and distance covariance • Advanced topics – Interactions with three (or more) variables, conditional indep. test – Optimal kernel choice – Bayesian inference without models

  23. References T. Read and N. Cressie. Goodness-Of-Fit Statistics for Discrete Multivariate Anal- ysis . Springer-Verlag, New York, 1988. 12-1

Recommend


More recommend