invariant neural networks and probabilistic symmetry
play

Invariant neural networks and probabilistic symmetry Benjamin - PowerPoint PPT Presentation

Invariant neural networks and probabilistic symmetry Benjamin Bloem-Reddy , University of Oxford Work with Yee Whye Teh 5 October 2018, OxWaSP Workshop Deep learning and statistics settings. semi-/unsupervised domains. B. Bloem-Reddy 2 / 20


  1. Invariant neural networks and probabilistic symmetry Benjamin Bloem-Reddy , University of Oxford Work with Yee Whye Teh 5 October 2018, OxWaSP Workshop

  2. Deep learning and statistics settings. semi-/unsupervised domains. B. Bloem-Reddy 2 / 20 • Deep neural networks have been applied successfully in a range of • Effort under way to improve performance in data poor and • Focus on symmetry . • The study of symmetry in probability and statistics has a long history.

  3. Symmetric neural networks n network. If X and Y are assumed to satisfy a symmetry property, B. Bloem-Reddy 3 / 20 ( ) w ( ℓ ) ∑ f ℓ, i = σ i , j f ℓ − 1 , j j = 1 For input X and output Y , model Y = h ( X ) , where h ∈ H is a neural how is H restricted?

  4. Symmetric neural networks Convolutional neural networks encode translation invariance: Illustration from medium.freecodecamp.org B. Bloem-Reddy 4 / 20

  5. Why symmetry? in stabler training and better generalization through • reduction in dimension of parameter space through weight-tying; and • capturing structure at multiple scales via pooling. empirical evidence, loose connections to learning theory and what we “know” about high-dimensional data analysis. Some PAC theory to this end [Sha91; Sha95]; I haven’t found anything else. B. Bloem-Reddy 5 / 20 Encoding symmetry in network architecture is a Good Thing ∗ , i.e., it results ∗ Oft-stated “fact”. Mostly supported by heuristics and intuition, some

  6. Neural networks for permutation-invariant data [Zah+17] X 1 X 2 X 3 X 4 Y B. Bloem-Reddy 6 / 20 Consider a sequence X [ n ] := ( X 1 , . . . , X n ) , X i ∈ X . Invariance: Y = h ( X [ n ] ) = h ( π · X [ n ] ) for all π ∈ S n .

  7. Neural networks for permutation-invariant data [Zah+17] X 2 B. Bloem-Reddy n h X 4 X 3 Y 6 / 20 Y X 1 X 2 X 4 X 3 Consider a sequence X [ n ] := ( X 1 , . . . , X n ) , X i ∈ X . Invariance: Y = h ( X [ n ] ) = h ( π · X [ n ] ) for all π ∈ S n . ⇒ X 1 ( ) Y = ˜ ∑ Y = h ( X [ n ] ) �→ φ ( X i ) i = 1

  8. Neural networks for permutation-invariant data [Zah+17] X 1 X 2 X 3 X 4 Y 1 Y 2 Y 3 Y 4 B. Bloem-Reddy 7 / 20 Consider a sequence X [ n ] := ( X 1 , . . . , X n ) , X i ∈ X . Equivariance: Y [ n ] = h ( X [ n ] ) such that h ( π · X [ n ] ) = π · h ( X [ n ] ) for all π ∈ S n .

  9. Neural networks for permutation-invariant data [Zah+17] X 2 B. Bloem-Reddy X j n n Y 3 Y 2 Y 1 X 4 X 3 Y 4 X 1 X 3 X 1 X 2 Y 4 7 / 20 Y 3 X 4 Y 1 Y 2 Consider a sequence X [ n ] := ( X 1 , . . . , X n ) , X i ∈ X . Equivariance: Y [ n ] = h ( X [ n ] ) such that h ( π · X [ n ] ) = π · h ( X [ n ] ) for all π ∈ S n . ( ) ( ) ∑ ∑ [ h ( X [ n ] )] i = σ �→ [ h ( X [ n ] )] i = σ w 0 X i + w 1 w i , j X j j = 1 j = 1

  10. Neural networks for permutation-invariant data . . . B. Bloem-Reddy 8 / 20

  11. You could probably make some money making decent hats. Note to students: These were the first Google Image results for ”deep learning hat” and ”statistics hat”. B. Bloem-Reddy 9 / 20 ⟨⟨ Deep learning hat, off; statistics hat, on ⟩⟩

  12. Statistical models and symmetry If X is assumed to satisfy a symmetry property, B. Bloem-Reddy 10 / 20 Consider a sequence X [ n ] := ( X 1 , . . . , X n ) , X i ∈ X . A statistical model of X [ n ] is a family of probability distributions on X n : P = { P θ : θ ∈ Ω } . how is P restricted?

  13. Exchangeable sequences iid Analogous theorems for other symmetries. The book by Kallenberg [Kal05] collects many of them. Some other accessible references: [Dia88; OR15]. B. Bloem-Reddy 11 / 20 A distribution P on X n is exchangeable if P ( X 1 , . . . , X n ) = P ( X π ( 1 ) , . . . , X π ( n ) ) for all π ∈ S n . X N is infinitely exchangeable if this is true for all prefixes X [ n ] ⊂ X N , n ∈ N . de Finetti’s theorem: X N ⇐ ⇒ X i | Q ∼ Q for some random Q Our models for X N need only consist of i.i.d. distributions on X .

  14. Finite exchangeable sequences de Finetti’s theorem may fail for finite exchangeable sequences. What else can we say? n B. Bloem-Reddy 12 / 20 The empirical measure of X [ n ] is ∑ M X [ n ] ( • ) := δ X i ( • ) . i = 1

  15. Finite exchangeable sequences The empirical measure is sufficient : with empirical measure m . B. Bloem-Reddy 13 / 20 • | M X [ n ] = m ) = U m ( • ) , P ( X [ n ] ∈ where U m is the uniform distribution on all sequences ( x 1 , . . . , x n )

  16. Finite exchangeable sequences The empirical measure is sufficient : with empirical measure m . d B. Bloem-Reddy 13 / 20 • | M X [ n ] = m ) = U m ( • ) , P ( X [ n ] ∈ where U m is the uniform distribution on all sequences ( x 1 , . . . , x n ) The empirical measure is adequate for any Y such that ( π · X [ n ] , Y ) = ( X [ n ] , Y ) : • | X [ n ] = x [ n ] ) = P ( Y ∈ • | M X [ n ] = M x [ n ] ) . P ( Y ∈ M X [ n ] contains all information in X [ n ] that is relevant for predicting Y .

  17. A useful theorem Invariance theorem: d a.s., B. Bloem-Reddy 14 / 20 Suppose X [ n ] is an exchangeable sequence. ( π · X [ n ] , Y ) = ( X [ n ] , Y ) for all π ∈ S n if and only if ( X [ n ] , Y ) = ( X [ n ] , ˜ h ( η, M X [ n ] )) with ˜ h a measurable function and η ∼ Unif [ 0 , 1 ] , η ⊥ ⊥ X [ n ] .

  18. A useful theorem X 3 B. Bloem-Reddy n h n h X 4 X 3 X 2 X 1 Y X 4 Y X 2 a.s., Invariance theorem: d X 1 14 / 20 Suppose X [ n ] is an exchangeable sequence. ( π · X [ n ] , Y ) = ( X [ n ] , Y ) for all π ∈ S n if and only if ( X [ n ] , Y ) = ( X [ n ] , ˜ h ( η, M X [ n ] )) with ˜ h a measurable function and η ∼ Unif [ 0 , 1 ] , η ⊥ ⊥ X [ n ] . Deterministic invariance [Zah+17] �→ stochastic invariance [this work] η ( ) ( ) Y = ˜ Y = ˜ ∑ ∑ φ ( X i ) �→ η, δ X i i = 1 i = 1

  19. Another useful theorem Equivariance theorem: d a.s., B. Bloem-Reddy 15 / 20 ( π · X [ n ] , π · Y [ n ] ) = ( X [ n ] , Y [ n ] ) for all π ∈ S n if and only if X [ n ] , (˜ ( ) ( X [ n ] , Y [ n ] ) = h ( η i , X i , M X [ n ] )) i ∈ [ n ] with ˜ h a measurable function and i.i.d. η i ∼ Unif [ 0 , 1 ] , η i ⊥ ⊥ X [ n ] .

  20. Another useful theorem Y 3 Y 4 X 1 X 2 X 3 X 4 Y 1 Y 2 Y 4 Y 2 Equivariance theorem: n X j h n n B. Bloem-Reddy Y 3 15 / 20 Y 1 d X 4 a.s., X 2 X 1 X 3 ( π · X [ n ] , π · Y [ n ] ) = ( X [ n ] , Y [ n ] ) for all π ∈ S n if and only if X [ n ] , (˜ ( ) ( X [ n ] , Y [ n ] ) = h ( η i , X i , M X [ n ] )) i ∈ [ n ] with ˜ h a measurable function and i.i.d. η i ∼ Unif [ 0 , 1 ] , η i ⊥ ⊥ X [ n ] . Deterministic equivariance [Zah+17] �→ stochastic equivariance [this work] η 1 η 2 η 3 η 4 ( ) ( ) Y i = ˜ ∑ ∑ Y i = σ w 0 X i + w 1 �→ η i , X i , δ X j j = 1 j = 1 ( ∫ ) ∑ = σ w 0 X i + w 1 δ X j ( dx ) X j = 1

  21. Some answers other related structures. special cases. B. Bloem-Reddy 16 / 20 • Sufficiency/adequacy provides the magic. • Similar results for exchangeable graphs/arrays/tensors and some • Framework is general enough that it catches a lot of existing work as • Suggests some new (stochastic) network architectures.

  22. Many questions analogous results? Equivariance is especially difficult. symmetry (though they typically have a set of symmetry transformations)—what are the analogous results? Are they useful? this context it amounts to the difference between deterministic invariance and distributional invariance—can we prove anything rigorous in these settings? networks is a Good Thing) on rigorous footing? B. Bloem-Reddy 17 / 20 • For group symmetries that don’t involve permutations—what are the • There are models with sufficient statistics that don’t have group • Evidence that adding noise during training has beneficial effects; in • Relatedly, can we put the “fact” (encoding symmetry in neural

  23. Thank you. B. Bloem-Reddy 18 / 20

  24. [Aus13] Tim Austin. “Exchangeable random arrays”. Lecture notes for IIS. 2013. url: B. Bloem-Reddy Olav Kallenberg. Probabilistic Symmetries and Invariance Principles . Springer, 2005. [Kal05] https://arxiv.org/abs/1802.05451 . Roei Herzig et al. “Mapping Images to Scene Graphs with Permutation-Invariant Structured [Her+18] Vol. 80. Proceedings of Machine Learning Research. PMLR, 2018, pp. 1914–1923. International Conference on Machine Learning . Ed. by Jennifer Dy and Andreas Krause. Jason Hartford et al. “Deep Models of Interactions Across Sets”. In: Proceedings of the 35th [Har+18] http://papers.nips.cc/paper/5424-deep-symmetry-networks.pdf . pp. 2537–2545. url: Information Processing Systems 27 . Ed. by Z. Ghahramani et al. Curran Associates, Inc., 2014, Robert Gens and Pedro M Domingos. “Deep Symmetry Networks”. In: Advances in Neural [GD14] Symposium . Ed. by F. Browder. American Mathematical Society, 1988, pp. 15–26. P. Diaconis. “Sufficiency as statistical symmetry”. In: Proceedings of the AMS Centennial [Dia88] http://proceedings.mlr.press/v48/cohenc16.html . USA: PMLR, 2016, pp. 2990–2999. url: Kilian Q. Weinberger. Vol. 48. Proceedings of Machine Learning Research. New York, New York, of The 33rd International Conference on Machine Learning . Ed. by Maria Florina Balcan and Taco Cohen and Max Welling. “Group Equivariant Convolutional Networks”. In: Proceedings [CW16] https://openreview.net/pdf?id=Hkbd5xZRb . Taco S. Cohen et al. “Spherical CNNs”. In: ICLR . 2018. url: [Coh+18] http://www.math.ucla.edu/~tim/ExchnotesforIISc.pdf . 18 / 20 Prediction”. In: (Feb. 2018). eprint: 1802.05451 . url:

Recommend


More recommend