probabilistic symmetry and invariant neural networks
play

Probabilistic symmetry and invariant neural networks Benjamin - PowerPoint PPT Presentation

Probabilistic symmetry and invariant neural networks Benjamin Bloem-Reddy , University of Oxford Work with Yee Whye Teh 14 January 2019, UBC Computer Science models Outline B. Bloem-Reddy 2 / 27 Symmetry in neural networks


  1. Probabilistic symmetry and invariant neural networks Benjamin Bloem-Reddy , University of Oxford Work with Yee Whye Teh 14 January 2019, UBC Computer Science

  2. models Outline B. Bloem-Reddy 2 / 27 • Symmetry in neural networks • Permutation-invariant neural networks • Symmetry in probability and statistics • Exchangeable sequences • Permutation-invariant neural networks as exchangeable probability • Symmetry in neural networks as probabilistic symmetry

  3. Deep learning and statistics settings. semi-/unsupervised domains. B. Bloem-Reddy 3 / 27 • Deep neural networks have been applied successfully in a range of • Effort under way to improve performance in data poor and • Focus on symmetry . • The study of symmetry in probability and statistics has a long history.

  4. Symmetric neural networks n network. If X and Y are assumed to satisfy a symmetry property, B. Bloem-Reddy 4 / 27 ( ) w ( ℓ ) ∑ f ℓ, i = σ i , j f ℓ − 1 , j j = 1 For input X and output Y , model Y = h ( X ) , where h ∈ H is a neural how is H restricted?

  5. Symmetric neural networks Convolutional neural networks encode translation invariance: Illustration from medium.freecodecamp.org B. Bloem-Reddy 5 / 27

  6. Why symmetry? Stabler training and better generalization through • reduction in dimension of parameter space through weight-tying; and • capturing structure at multiple scales via pooling. Historical note: Interest in invariant neural networks goes back at least to Minsky and Papert [MP88]; extended by Shawe-Taylor and Wood [Sha89; WS96]. More recent work by a host of others. B. Bloem-Reddy 6 / 27 Encoding symmetry in network architecture is a Good Thing ∗ .

  7. Neural networks for permutation-invariant data [Zah+17] Permutation invariance: X 1 X 2 X 3 X 4 Y B. Bloem-Reddy 7 / 27 Consider a sequence X n := ( X 1 , . . . , X n ) , X i ∈ X . Y = h ( X n ) = h ( π · X n ) for all π ∈ S n .

  8. Neural networks for permutation-invariant data [Zah+17] X 1 B. Bloem-Reddy n h Y X 3 X 2 X 4 7 / 27 X 2 Permutation invariance: Y X 1 X 3 X 4 Consider a sequence X n := ( X 1 , . . . , X n ) , X i ∈ X . Y = h ( X n ) = h ( π · X n ) for all π ∈ S n . �→ ( ) Y = ˜ ∑ Y = h ( X n ) �→ φ ( X i ) i = 1

  9. Neural networks for permutation-invariant data [Zah+17] Equivariance: X 1 X 2 X 3 X 4 Y 1 Y 2 Y 3 Y 4 B. Bloem-Reddy 8 / 27 Y n = h ( X n ) such that h ( π · X n ) = π · h ( X n ) for all π ∈ S n .

  10. Neural networks for permutation-invariant data [Zah+17] X 3 B. Bloem-Reddy X j n n Equivariance: Y 3 Y 2 Y 1 X 4 Y 4 X 2 X 3 X 1 X 2 X 1 8 / 27 X 4 Y 4 Y 1 Y 2 Y 3 Y n = h ( X n ) such that h ( π · X n ) = π · h ( X n ) for all π ∈ S n . ( ) ( ) ∑ ∑ [ h ( X n )] i = σ �→ [ h ( X n )] i = σ w 0 X i + w 1 w i , j X j j = 1 j = 1

  11. Neural networks for permutation-invariant data . . . B. Bloem-Reddy 9 / 27

  12. You could probably make some money making decent hats. Note to students: These were the first Google Image results for ”deep learning hat” and ”statistics hat”. B. Bloem-Reddy 10 / 27 ⟨⟨ Deep learning hat, off; statistics hat, on ⟩⟩

  13. Statistical models and symmetry If X is assumed to satisfy a symmetry property, B. Bloem-Reddy 11 / 27 Consider a sequence X n := ( X 1 , . . . , X n ) , X i ∈ X . A statistical model of X n is a family of probability distributions on X n : P = { P θ : θ ∈ Ω } . how is P restricted?

  14. Exchangeable sequences de Finetti’s theorem: iid Implication for Bayesian inference: Analogous theorems for other symmetries. The book by Kallenberg [Kal05] collects many of them. Some other accessible references: [Dia88; OR15]. B. Bloem-Reddy 12 / 27 A distribution P on X n is exchangeable if P ( X 1 , . . . , X n ) = P ( X π ( 1 ) , . . . , X π ( n ) ) for all π ∈ S n . X N is infinitely exchangeable if this is true for all prefixes X n ⊂ X N , n ∈ N . X N exchangeable ⇐ ⇒ X i | Q ∼ Q for some random Q . Our models for X N need only consist of i.i.d. distributions on X .

  15. Finite exchangeable sequences de Finetti’s theorem may fail for finite exchangeable sequences. What else can we say? n B. Bloem-Reddy 13 / 27 The empirical measure of X n is ∑ M X n ( • ) := δ X i ( • ) . i = 1

  16. Finite exchangeable sequences The empirical measure is a sufficient statistic : P is exchangeable iff with empirical measure m . B. Bloem-Reddy 14 / 27 • | M X n = m ) = U m ( • ) , P ( X n ∈ where U m is the uniform distribution on all sequences ( x 1 , . . . , x n )

  17. Finite exchangeable sequences The empirical measure is a sufficient statistic : P is exchangeable iff with empirical measure m . d The empirical measure is an adequate statistic for any such Y : B. Bloem-Reddy 14 / 27 • | M X n = m ) = U m ( • ) , P ( X n ∈ where U m is the uniform distribution on all sequences ( x 1 , . . . , x n ) Consider Y such that ( π · X n , Y ) = ( X n , Y ) . • | X n = x n ) = P ( Y ∈ • | M X n = M x n ) . P ( Y ∈ M X n contains all information in X n that is relevant for predicting Y .

  18. A useful theorem Theorem (Invariant representation; B-R, Teh) d a.s. B. Bloem-Reddy 15 / 27 Suppose X n is an exchangeable sequence. Then ( π · X n , Y ) = ( X n , Y ) for all π ∈ S n if and only if there is a mea- surable function ˜ h : [ 0 , 1 ] × M ( X ) → Y such that = ( X n , ˜ ( X n , Y ) h ( η, M X n )) and η ∼ Unif [ 0 , 1 ] , η ⊥ ⊥ X n .

  19. A useful theorem X 3 B. Bloem-Reddy n h n h Y Theorem (Invariant representation; B-R, Teh) X 3 X 2 X 1 Y X 4 X 4 X 2 a.s. d X 1 15 / 27 Suppose X n is an exchangeable sequence. Then ( π · X n , Y ) = ( X n , Y ) for all π ∈ S n if and only if there is a mea- surable function ˜ h : [ 0 , 1 ] × M ( X ) → Y such that = ( X n , ˜ ( X n , Y ) h ( η, M X n )) and η ∼ Unif [ 0 , 1 ] , η ⊥ ⊥ X n . Deterministic invariance [Zah+17] �→ stochastic invariance [B-R, Teh] η ( ) ( ) Y = ˜ ∑ Y = ˜ ∑ φ ( X i ) �→ η, δ X i i = 1 i = 1

  20. Another useful theorem a.s. B. Bloem-Reddy iid Theorem (Equivariant representation; B-R, Teh) 16 / 27 d Suppose X n is an exchangeable sequence and Y i ⊥ ⊥ X n ( Y n \ Y i ) . Then ( π · X n , π · Y n ) = ( X n , Y n ) for all π ∈ S n if and only if there is a measurable function ˜ h : [ 0 , 1 ] × X × M ( X ) → Y such that X n , (˜ ( ) ( X n , Y n ) = h ( η i , X i , M X n )) i ∈ [ n ] and η i ∼ Unif [ 0 , 1 ] , ( η i ) i ∈ [ n ] ⊥ ⊥ X n .

  21. Another useful theorem Theorem (Equivariant representation; B-R, Teh) Y 2 Y 3 Y 4 X 1 X 2 X 3 X 4 Y 1 Y 2 Y 4 X 4 n X j h n w 0 X i w 1 n j 1 B. Bloem-Reddy Y 1 Y 3 X 3 iid d X 2 a.s. X 1 16 / 27 Suppose X n is an exchangeable sequence and Y i ⊥ ⊥ X n ( Y n \ Y i ) . Then ( π · X n , π · Y n ) = ( X n , Y n ) for all π ∈ S n if and only if there is a measurable function ˜ h : [ 0 , 1 ] × X × M ( X ) → Y such that X n , (˜ ( ) ( X n , Y n ) = h ( η i , X i , M X n )) i ∈ [ n ] and η i ∼ Unif [ 0 , 1 ] , ( η i ) i ∈ [ n ] ⊥ ⊥ X n . Deterministic equivariance [Zah+17] �→ stochastic equivariance [B-R, Teh] η 1 η 2 η 3 η 4 ( ) ( ) ∑ Y i = ˜ ∑ Y i = σ w 0 X i + w 1 �→ η i , X i , δ X j j = 1 j = 1 ( ) ∫ ∑ X j dx

  22. models Outline B. Bloem-Reddy 17 / 27 • Symmetry in neural networks • Permutation-invariant neural networks • Symmetry in probability and statistics • Exchangeable sequences • Permutation-invariant neural networks as exchangeable probability • Symmetry in neural networks as probabilistic symmetry

  23. and A bit of group theory B. Bloem-Reddy 18 / 27 For a group G acting on a set X : • The orbit of any x ∈ X is the subset of X generated by applying G to x : G · x = { g · x ; g ∈ G} . • A maximal invariant statistic M : X → S (i) is constant on an orbit, i.e., M ( g · x ) = M ( x ) for all g ∈ G and x ∈ X ; (ii) takes a different value on each orbit, i.e., M ( x 1 ) = M ( x 2 ) implies x 1 = g · x 2 for some g ∈ G . • A maximal equivariant τ : X → G satisfies τ ( g · X ) = g · τ ( x ) , g ∈ G , x ∈ X .

  24. A general invariance theorem d B. Bloem-Reddy a.s. Theorem (B-R, Teh) 19 / 27 d Let G be a compact group and assume that g · X = X for all g ∈ G . Let M : X → S be a maximal invariant. Then ( g · X , Y ) = ( X , Y ) for all g ∈ G if and only if there exists a mea- surable function ˜ h : [ 0 , 1 ] × S → Y such that X , ˜ ( ) ( X , Y ) = h ( η, M ( X )) with η ∼ Unif [ 0 , 1 ] and η ⊥ ⊥ X .

  25. Proof by picture B. Bloem-Reddy 20 / 27 P ( g · X , Y ) = P ( X , Y ) for all g ∈ G X Y

  26. Proof by picture B. Bloem-Reddy 20 / 27 P ( g · X , M ( g · X ) , Y ) = P ( X , M ( X ) , Y ) for all g ∈ G ⇒ Y ⊥ ⊥ M ( X ) X X Y M ( X )

  27. A general equivariance theorem a.s. B. Bloem-Reddy a.s. h is equivariant: Theorem (Kallenberg; B-R, Teh) 21 / 27 d d Let G be a compact group and assume that g · X = X for all g ∈ G . Assume that a maximal equivariant τ : X → G exists. Then ( g · X , g · Y ) = ( X , Y ) for all g ∈ G if and only if there exists a measurable function ˜ h : [ 0 , 1 ] × X → Y such that X , ˜ ( ) ( X , Y ) = h ( η, X ) with η ∼ Unif [ 0 , 1 ] and η ⊥ ⊥ X , where ˜ ˜ = g · ˜ h ( η, g · X ) h ( η, X ) , g ∈ G .

  28. Proof by picture B. Bloem-Reddy 22 / 27 P ( g · X , g · Y ) = P ( X , Y ) for all g ∈ G X Y

Recommend


More recommend