mean field theory of two layers neural networks dimension
play

Mean-field theory of two-layers neural networks: dimension-free - PowerPoint PPT Presentation

Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit Song Mei, Theodor Misiakiewicz, and Andrea Montanari Stanford University June 26, 2019 COLT 2019 Song Mei (Stanford University) Mean Field Dynamics for


  1. Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit Song Mei, Theodor Misiakiewicz, and Andrea Montanari Stanford University June 26, 2019 COLT 2019 Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 1 / 12

  2. Gradient dynamics of two-layers neural network ◮ Two layers neural network: θ ✐ ❂ ✭ ❛ ✐ ❀ w ✐ ✮ ✷ R ❉ ✿ Θ ❂✭ θ ✶ ❀ ✿ ✿ ✿ ❀ θ ◆ ✮ ❀ ◆ ② ✭ x ❀ Θ ✮ ❂ ✶ ❳ ❫ ❛ ✐ ✛ ✭ ❤ w ✐ ❀ x ✐ ✮ ✿ ◆ ✐ ❂✶ ◮ Risk function: ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ Θ ✮ ❂ E x ❀② ❛ ✐ ✛ ✭ ❤ w ✐ ❀ x ✐ ✮✮ ✿ ◆ ✐ ❂✶ ◮ SGD/gradient flow: Θ ❦ ✰✶ ❂ Θ ❦ � ✑ ❦ r ❵ ◆ ✭ Θ ❦ ❀ x ❦ ❀ ② ❦ ✮ ❀ ❞ ❞ t Θ t ❂ � r ❘ ◆ ✭ Θ t ✮ ✿ Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 2 / 12

  3. Gradient dynamics of two-layers neural network ◮ Two layers neural network: θ ✐ ❂ ✭ ❛ ✐ ❀ w ✐ ✮ ✷ R ❉ ✿ Θ ❂✭ θ ✶ ❀ ✿ ✿ ✿ ❀ θ ◆ ✮ ❀ ◆ ② ✭ x ❀ Θ ✮ ❂ ✶ ❳ ❫ ❛ ✐ ✛ ✭ ❤ w ✐ ❀ x ✐ ✮ ✿ ◆ ✐ ❂✶ ◮ Risk function: ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ Θ ✮ ❂ E x ❀② ❛ ✐ ✛ ✭ ❤ w ✐ ❀ x ✐ ✮✮ ✿ ◆ ✐ ❂✶ ◮ SGD/gradient flow: Θ ❦ ✰✶ ❂ Θ ❦ � ✑ ❦ r ❵ ◆ ✭ Θ ❦ ❀ x ❦ ❀ ② ❦ ✮ ❀ ❞ ❞ t Θ t ❂ � r ❘ ◆ ✭ Θ t ✮ ✿ Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 2 / 12

  4. Gradient dynamics of two-layers neural network ◮ Two layers neural network: θ ✐ ❂ ✭ ❛ ✐ ❀ w ✐ ✮ ✷ R ❉ ✿ Θ ❂✭ θ ✶ ❀ ✿ ✿ ✿ ❀ θ ◆ ✮ ❀ ◆ ② ✭ x ❀ Θ ✮ ❂ ✶ ❳ ❫ ❛ ✐ ✛ ✭ ❤ w ✐ ❀ x ✐ ✮ ✿ ◆ ✐ ❂✶ ◮ Risk function: ◆ ② � ✶ ✑ ✷ ✐ ❤✏ ❳ ❘ ◆ ✭ Θ ✮ ❂ E x ❀② ❛ ✐ ✛ ✭ ❤ w ✐ ❀ x ✐ ✮✮ ✿ ◆ ✐ ❂✶ ◮ SGD/gradient flow: Θ ❦ ✰✶ ❂ Θ ❦ � ✑ ❦ r ❵ ◆ ✭ Θ ❦ ❀ x ❦ ❀ ② ❦ ✮ ❀ ❞ ❞ t Θ t ❂ � r ❘ ◆ ✭ Θ t ✮ ✿ Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 2 / 12

  5. Two-layers neural networks Input layer Hidden layer Output layer w 1 a 1 w 2 a 2 w 3 a 3 a 4 w 4 Figure: Architecture for ◆ ❂ ✹ . θ ✐ ❂ ✭ ❛ ✐ ❀ w ✐ ✮ Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 3 / 12

  6. Related literatures ◮ Mean field distributional dynamics: ❅ t ✚ t ✭ θ ✮ ❂ r ✁ ✭ r ✠✭ θ ❀ ✚ t ✮ ✚ t ✮ ✿ ◮ Non-linear dynamics. Converges in some cases. ◮ [Mei, Montanari, Nguyen, 2018], [Rotskoff and Vanden-Eijnden, 2018], [Chizat and Bach, 2018a], [Sirignano and Spiliopoulos, 2018]. ◮ Neural tangent kernel (NTK) dynamics: ❅ t ❦ u t ❦ ✷ ✷ ❂ �❤ u t ❀ ❍ u t ✐ ✿ ◮ Linear dynamics. Always converges to ✵ empirical risk. ◮ [Jacot, Gabriel, and Clement, 2018], [Li and Liang, 2018], [Du, Zhai, Poczos, Singh, 2018]. Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 4 / 12

  7. Related literatures ◮ Mean field distributional dynamics: ❅ t ✚ t ✭ θ ✮ ❂ r ✁ ✭ r ✠✭ θ ❀ ✚ t ✮ ✚ t ✮ ✿ ◮ Non-linear dynamics. Converges in some cases. ◮ [Mei, Montanari, Nguyen, 2018], [Rotskoff and Vanden-Eijnden, 2018], [Chizat and Bach, 2018a], [Sirignano and Spiliopoulos, 2018]. ◮ Neural tangent kernel (NTK) dynamics: ❅ t ❦ u t ❦ ✷ ✷ ❂ �❤ u t ❀ ❍ u t ✐ ✿ ◮ Linear dynamics. Always converges to ✵ empirical risk. ◮ [Jacot, Gabriel, and Clement, 2018], [Li and Liang, 2018], [Du, Zhai, Poczos, Singh, 2018]. Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 4 / 12

  8. Related literatures ◮ Mean field distributional dynamics: ❅ t ✚ t ✭ θ ✮ ❂ r ✁ ✭ r ✠✭ θ ❀ ✚ t ✮ ✚ t ✮ ✿ ◮ Non-linear dynamics. Converges in some cases. ◮ [Mei, Montanari, Nguyen, 2018], [Rotskoff and Vanden-Eijnden, 2018], [Chizat and Bach, 2018a], [Sirignano and Spiliopoulos, 2018]. ◮ Neural tangent kernel (NTK) dynamics: ❅ t ❦ u t ❦ ✷ ✷ ❂ �❤ u t ❀ ❍ u t ✐ ✿ ◮ Linear dynamics. Always converges to ✵ empirical risk. ◮ [Jacot, Gabriel, and Clement, 2018], [Li and Liang, 2018], [Du, Zhai, Poczos, Singh, 2018]. Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 4 / 12

  9. Related literatures ◮ Mean field distributional dynamics: ❅ t ✚ t ✭ θ ✮ ❂ r ✁ ✭ r ✠✭ θ ❀ ✚ t ✮ ✚ t ✮ ✿ ◮ Non-linear dynamics. Converges in some cases. ◮ [Mei, Montanari, Nguyen, 2018], [Rotskoff and Vanden-Eijnden, 2018], [Chizat and Bach, 2018a], [Sirignano and Spiliopoulos, 2018]. ◮ Neural tangent kernel (NTK) dynamics: ❅ t ❦ u t ❦ ✷ ✷ ❂ �❤ u t ❀ ❍ u t ✐ ✿ ◮ Linear dynamics. Always converges to ✵ empirical risk. ◮ [Jacot, Gabriel, and Clement, 2018], [Li and Liang, 2018], [Du, Zhai, Poczos, Singh, 2018]. Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 4 / 12

  10. Related literatures ◮ Mean field distributional dynamics: ❅ t ✚ t ✭ θ ✮ ❂ r ✁ ✭ r ✠✭ θ ❀ ✚ t ✮ ✚ t ✮ ✿ ◮ Non-linear dynamics. Converges in some cases. ◮ [Mei, Montanari, Nguyen, 2018], [Rotskoff and Vanden-Eijnden, 2018], [Chizat and Bach, 2018a], [Sirignano and Spiliopoulos, 2018]. ◮ Neural tangent kernel (NTK) dynamics: ❅ t ❦ u t ❦ ✷ ✷ ❂ �❤ u t ❀ ❍ u t ✐ ✿ ◮ Linear dynamics. Always converges to ✵ empirical risk. ◮ [Jacot, Gabriel, and Clement, 2018], [Li and Liang, 2018], [Du, Zhai, Poczos, Singh, 2018]. Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 4 / 12

  11. Related literatures ◮ Mean field distributional dynamics: ❅ t ✚ t ✭ θ ✮ ❂ r ✁ ✭ r ✠✭ θ ❀ ✚ t ✮ ✚ t ✮ ✿ ◮ Non-linear dynamics. Converges in some cases. ◮ [Mei, Montanari, Nguyen, 2018], [Rotskoff and Vanden-Eijnden, 2018], [Chizat and Bach, 2018a], [Sirignano and Spiliopoulos, 2018]. ◮ Neural tangent kernel (NTK) dynamics: ❅ t ❦ u t ❦ ✷ ✷ ❂ �❤ u t ❀ ❍ u t ✐ ✿ ◮ Linear dynamics. Always converges to ✵ empirical risk. ◮ [Jacot, Gabriel, and Clement, 2018], [Li and Liang, 2018], [Du, Zhai, Poczos, Singh, 2018]. Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 4 / 12

  12. This work (a) Improved bound for SGD - PDE interpolation. (b) Relationship of the mean field limit and the kernel limit. Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 5 / 12

  13. SGD and distributional dynamics (DD) ◮ SGD for Θ ❦ , with ✭ x ❦ ❀ ② ❦ ✮ ✘ P x ❀② , ✐ ✷ ❬ ◆ ❪ , θ ❦ ✰✶ ❂ θ ❦ ✐ � ✷ s ❦ ◆ r θ ✐ ❵ ◆ ✭ Θ ❦ ❀ x ❦ ❀ ② ❦ ✮ ✿ (SGD) ✐ ◮ [MMN18]: s ❦ ❂ ✧✘ ✭ ❦✧ ✮ , ❦ ❂ t❂✧ , ◆ ✦ ✶ , ✧ ✦ ✵ : ◆ ✑ ✶ ✚ ✭ ◆ ✮ ✐ ✮ ✚ t ✷ P ✭ R ❉ ✮ ✂ ❬✵ ❀ ✶ ✮ ✿ ❳ ❫ ✍ θ ❦ ❦ ◆ ✐ ❂✶ ◮ Distributional dynamics (DD) for ✚ t , ❅ t ✚ t ✭ θ ✮ ❂✷ ✘ ✭ t ✮ r θ ✁ ✭ ✚ t ✭ θ ✮ r θ ✠✭ θ ❀ ✚ t ✮✮ ❀ (DD) where ✠✭ θ ❀ ✚ ✮ ❂ ✍❘ ✭ ✚ ✮ ❩ ❯ ✭ θ ❀ θ ✵ ✮ ✚ ✭❞ θ ✵ ✮ ✿ ✍✚ ✭ θ ✮ ❂ ❱ ✭ θ ✮ ✰ Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 6 / 12

  14. SGD and distributional dynamics (DD) ◮ SGD for Θ ❦ , with ✭ x ❦ ❀ ② ❦ ✮ ✘ P x ❀② , ✐ ✷ ❬ ◆ ❪ , θ ❦ ✰✶ ❂ θ ❦ ✐ � ✷ s ❦ ◆ r θ ✐ ❵ ◆ ✭ Θ ❦ ❀ x ❦ ❀ ② ❦ ✮ ✿ (SGD) ✐ ◮ [MMN18]: s ❦ ❂ ✧✘ ✭ ❦✧ ✮ , ❦ ❂ t❂✧ , ◆ ✦ ✶ , ✧ ✦ ✵ : ◆ ✑ ✶ ✚ ✭ ◆ ✮ ✐ ✮ ✚ t ✷ P ✭ R ❉ ✮ ✂ ❬✵ ❀ ✶ ✮ ✿ ❳ ❫ ✍ θ ❦ ❦ ◆ ✐ ❂✶ ◮ Distributional dynamics (DD) for ✚ t , ❅ t ✚ t ✭ θ ✮ ❂✷ ✘ ✭ t ✮ r θ ✁ ✭ ✚ t ✭ θ ✮ r θ ✠✭ θ ❀ ✚ t ✮✮ ❀ (DD) where ✠✭ θ ❀ ✚ ✮ ❂ ✍❘ ✭ ✚ ✮ ❩ ❯ ✭ θ ❀ θ ✵ ✮ ✚ ✭❞ θ ✵ ✮ ✿ ✍✚ ✭ θ ✮ ❂ ❱ ✭ θ ✮ ✰ Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 6 / 12

  15. SGD and distributional dynamics (DD) ◮ SGD for Θ ❦ , with ✭ x ❦ ❀ ② ❦ ✮ ✘ P x ❀② , ✐ ✷ ❬ ◆ ❪ , θ ❦ ✰✶ ❂ θ ❦ ✐ � ✷ s ❦ ◆ r θ ✐ ❵ ◆ ✭ Θ ❦ ❀ x ❦ ❀ ② ❦ ✮ ✿ (SGD) ✐ ◮ [MMN18]: s ❦ ❂ ✧✘ ✭ ❦✧ ✮ , ❦ ❂ t❂✧ , ◆ ✦ ✶ , ✧ ✦ ✵ : ◆ ✑ ✶ ✚ ✭ ◆ ✮ ✐ ✮ ✚ t ✷ P ✭ R ❉ ✮ ✂ ❬✵ ❀ ✶ ✮ ✿ ❳ ❫ ✍ θ ❦ ❦ ◆ ✐ ❂✶ ◮ Distributional dynamics (DD) for ✚ t , ❅ t ✚ t ✭ θ ✮ ❂✷ ✘ ✭ t ✮ r θ ✁ ✭ ✚ t ✭ θ ✮ r θ ✠✭ θ ❀ ✚ t ✮✮ ❀ (DD) where ✠✭ θ ❀ ✚ ✮ ❂ ✍❘ ✭ ✚ ✮ ❩ ❯ ✭ θ ❀ θ ✵ ✮ ✚ ✭❞ θ ✵ ✮ ✿ ✍✚ ✭ θ ✮ ❂ ❱ ✭ θ ✮ ✰ Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 6 / 12

Recommend


More recommend