when do neural networks outperform kernel methods

When do neural networks outperform kernel methods? Song Mei - PowerPoint PPT Presentation

Jun 04, 2023 •382 likes •646 views

When do neural networks outperform kernel methods? Song Mei Stanford University June 29, 2020 Joint work with Behrooz Ghorbani, Theodor Misiakiewicz, and Andrea Montanari Song Mei (Stanford University) Neural Networks and Kernel Methods June

When do neural networks outperform kernel methods? Song Mei Stanford University June 29, 2020 Joint work with Behrooz Ghorbani, Theodor Misiakiewicz, and Andrea Montanari Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 1 / 15
Neural tangent model ◮ Multi-layers NN : ❢ ◆ ✭ x ❀ θ ✮ , x ✷ R ❞ , θ ✷ R ◆ ◮ Expanding around θ ✵ : ❢ ◆ ✭ x ❀ θ ✮ ❂ ❢ ◆ ✭ x ❀ θ ✵ ✮ ✰ ❤ θ � θ ✵ ❀ r θ ❢ ◆ ✭ x ❀ θ ✵ ✮ ✐ ✰ ♦ ✭ ❦ θ � θ ✵ ❦ ✷ ✮ ✿ ◮ Neural tangent model: ❢ NT ❀◆ ✭ x ❀ β ❀ θ ✵ ✮ ❂ ❤ β ❀ r θ ❢ ◆ ✭ x ❀ θ ✵ ✮ ✐ ✿ ◮ Coupled gradient flow: ❞ ❞ t θ t ❂ � r θ ❫ θ ✵ ❂ θ ✵ ❀ E ❬✭ ② � ❢ ◆ ✭ x ❀ θ t ✮✮ ✷ ❪ ❀ ❞ ❞ t β t ❂ � r β ❫ β ✵ ❂ 0 ✿ E ❬✭ ② � ❢ NT ❀◆ ✭ x ❀ β t ❀ θ ✵ ✮✮ ✷ ❪ ❀ ◮ Under proper initialization and over-parameterization: ◆ ✦✶ ❥ ❢ ◆ ✭ x ❀ θ t ✮ � ❢ NT ❀◆ ✭ x ❀ β t ✮ ❥ ❂ ✵ ✿ ❧✐♠ [Jacot, Gabriel, Hongler, 2018], [Du, Zhai, Poczos, Singh, 2018], [Chizat, Bach, 2018b], .... Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 2 / 15
Neural tangent model ◮ Multi-layers NN : ❢ ◆ ✭ x ❀ θ ✮ , x ✷ R ❞ , θ ✷ R ◆ ◮ Expanding around θ ✵ : ❢ ◆ ✭ x ❀ θ ✮ ❂ ❢ ◆ ✭ x ❀ θ ✵ ✮ ✰ ❤ θ � θ ✵ ❀ r θ ❢ ◆ ✭ x ❀ θ ✵ ✮ ✐ ✰ ♦ ✭ ❦ θ � θ ✵ ❦ ✷ ✮ ✿ ◮ Neural tangent model: ❢ NT ❀◆ ✭ x ❀ β ❀ θ ✵ ✮ ❂ ❤ β ❀ r θ ❢ ◆ ✭ x ❀ θ ✵ ✮ ✐ ✿ ◮ Coupled gradient flow: ❞ ❞ t θ t ❂ � r θ ❫ θ ✵ ❂ θ ✵ ❀ E ❬✭ ② � ❢ ◆ ✭ x ❀ θ t ✮✮ ✷ ❪ ❀ ❞ ❞ t β t ❂ � r β ❫ β ✵ ❂ 0 ✿ E ❬✭ ② � ❢ NT ❀◆ ✭ x ❀ β t ❀ θ ✵ ✮✮ ✷ ❪ ❀ ◮ Under proper initialization and over-parameterization: ◆ ✦✶ ❥ ❢ ◆ ✭ x ❀ θ t ✮ � ❢ NT ❀◆ ✭ x ❀ β t ✮ ❥ ❂ ✵ ✿ ❧✐♠ [Jacot, Gabriel, Hongler, 2018], [Du, Zhai, Poczos, Singh, 2018], [Chizat, Bach, 2018b], .... Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 2 / 15
Neural tangent model ◮ Multi-layers NN : ❢ ◆ ✭ x ❀ θ ✮ , x ✷ R ❞ , θ ✷ R ◆ ◮ Expanding around θ ✵ : ❢ ◆ ✭ x ❀ θ ✮ ❂ ❢ ◆ ✭ x ❀ θ ✵ ✮ ✰ ❤ θ � θ ✵ ❀ r θ ❢ ◆ ✭ x ❀ θ ✵ ✮ ✐ ✰ ♦ ✭ ❦ θ � θ ✵ ❦ ✷ ✮ ✿ ◮ Neural tangent model: ❢ NT ❀◆ ✭ x ❀ β ❀ θ ✵ ✮ ❂ ❤ β ❀ r θ ❢ ◆ ✭ x ❀ θ ✵ ✮ ✐ ✿ ◮ Coupled gradient flow: ❞ ❞ t θ t ❂ � r θ ❫ θ ✵ ❂ θ ✵ ❀ E ❬✭ ② � ❢ ◆ ✭ x ❀ θ t ✮✮ ✷ ❪ ❀ ❞ ❞ t β t ❂ � r β ❫ β ✵ ❂ 0 ✿ E ❬✭ ② � ❢ NT ❀◆ ✭ x ❀ β t ❀ θ ✵ ✮✮ ✷ ❪ ❀ ◮ Under proper initialization and over-parameterization: ◆ ✦✶ ❥ ❢ ◆ ✭ x ❀ θ t ✮ � ❢ NT ❀◆ ✭ x ❀ β t ✮ ❥ ❂ ✵ ✿ ❧✐♠ [Jacot, Gabriel, Hongler, 2018], [Du, Zhai, Poczos, Singh, 2018], [Chizat, Bach, 2018b], .... Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 2 / 15
Neural tangent model ◮ Multi-layers NN : ❢ ◆ ✭ x ❀ θ ✮ , x ✷ R ❞ , θ ✷ R ◆ ◮ Expanding around θ ✵ : ❢ ◆ ✭ x ❀ θ ✮ ❂ ❢ ◆ ✭ x ❀ θ ✵ ✮ ✰ ❤ θ � θ ✵ ❀ r θ ❢ ◆ ✭ x ❀ θ ✵ ✮ ✐ ✰ ♦ ✭ ❦ θ � θ ✵ ❦ ✷ ✮ ✿ ◮ Neural tangent model: ❢ NT ❀◆ ✭ x ❀ β ❀ θ ✵ ✮ ❂ ❤ β ❀ r θ ❢ ◆ ✭ x ❀ θ ✵ ✮ ✐ ✿ ◮ Coupled gradient flow: ❞ ❞ t θ t ❂ � r θ ❫ θ ✵ ❂ θ ✵ ❀ E ❬✭ ② � ❢ ◆ ✭ x ❀ θ t ✮✮ ✷ ❪ ❀ ❞ ❞ t β t ❂ � r β ❫ β ✵ ❂ 0 ✿ E ❬✭ ② � ❢ NT ❀◆ ✭ x ❀ β t ❀ θ ✵ ✮✮ ✷ ❪ ❀ ◮ Under proper initialization and over-parameterization: ◆ ✦✶ ❥ ❢ ◆ ✭ x ❀ θ t ✮ � ❢ NT ❀◆ ✭ x ❀ β t ✮ ❥ ❂ ✵ ✿ ❧✐♠ [Jacot, Gabriel, Hongler, 2018], [Du, Zhai, Poczos, Singh, 2018], [Chizat, Bach, 2018b], .... Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 2 / 15
Neural tangent model ◮ Multi-layers NN : ❢ ◆ ✭ x ❀ θ ✮ , x ✷ R ❞ , θ ✷ R ◆ ◮ Expanding around θ ✵ : ❢ ◆ ✭ x ❀ θ ✮ ❂ ❢ ◆ ✭ x ❀ θ ✵ ✮ ✰ ❤ θ � θ ✵ ❀ r θ ❢ ◆ ✭ x ❀ θ ✵ ✮ ✐ ✰ ♦ ✭ ❦ θ � θ ✵ ❦ ✷ ✮ ✿ ◮ Neural tangent model: ❢ NT ❀◆ ✭ x ❀ β ❀ θ ✵ ✮ ❂ ❤ β ❀ r θ ❢ ◆ ✭ x ❀ θ ✵ ✮ ✐ ✿ ◮ Coupled gradient flow: ❞ ❞ t θ t ❂ � r θ ❫ θ ✵ ❂ θ ✵ ❀ E ❬✭ ② � ❢ ◆ ✭ x ❀ θ t ✮✮ ✷ ❪ ❀ ❞ ❞ t β t ❂ � r β ❫ β ✵ ❂ 0 ✿ E ❬✭ ② � ❢ NT ❀◆ ✭ x ❀ β t ❀ θ ✵ ✮✮ ✷ ❪ ❀ ◮ Under proper initialization and over-parameterization: ◆ ✦✶ ❥ ❢ ◆ ✭ x ❀ θ t ✮ � ❢ NT ❀◆ ✭ x ❀ β t ✮ ❥ ❂ ✵ ✿ ❧✐♠ [Jacot, Gabriel, Hongler, 2018], [Du, Zhai, Poczos, Singh, 2018], [Chizat, Bach, 2018b], .... Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 2 / 15
How about generalization? ◮ [Arora, Du, Hu, Li, Salakhutdinov, Wang, 2019]: Cifar10 experiments. NT: ✷✸✪ test error. NN: less than ✺✪ test error. ◮ [Arora, Du, Li, Salakhutdinov, Wang, Yu, 2019]: Small dataset, NT sometimes generalize better than NN. ◮ [Shankar, Fang, Guo, Fridovich-Keil, Schmidt, Ragan-Kelley, Recht, 2020] [Li, Wang, Yu, Du, Hu, Salakhutdinov, Arora, 2019]: Smaller gap between NT and NN on Cifar10 (10 ✪ for NT). Sometimes there is a large gap, while sometimes the gap is small. Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 3 / 15
How about generalization? ◮ [Arora, Du, Hu, Li, Salakhutdinov, Wang, 2019]: Cifar10 experiments. NT: ✷✸✪ test error. NN: less than ✺✪ test error. ◮ [Arora, Du, Li, Salakhutdinov, Wang, Yu, 2019]: Small dataset, NT sometimes generalize better than NN. ◮ [Shankar, Fang, Guo, Fridovich-Keil, Schmidt, Ragan-Kelley, Recht, 2020] [Li, Wang, Yu, Du, Hu, Salakhutdinov, Arora, 2019]: Smaller gap between NT and NN on Cifar10 (10 ✪ for NT). Sometimes there is a large gap, while sometimes the gap is small. Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 3 / 15
Focus of this talk When is there a large performance gap between NN and NT? Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 4 / 15
Two-layers neural networks Neural networks: ◆ ♥ ❛ ✐ ✛ ✭ ❤ w ✐ ❀ x ✐ ✮ ✿ ❛ ✐ ✷ R ❀ w ✐ ✷ R ❞ ♦ ❳ ❋ NN ❀◆ ❂ ❢ ◆ ✭ x ❀ Θ ✮ ❂ ✿ ✐ ❂✶ Linearization: ◆ ◆ ❳ ❳ ❢ ◆ ✭ x ❀ Θ ✮ ❂ ❢ ◆ ✭ x ❀ Θ ✵ ✮ ✰ ✁ ❛ ✐ ✛ ✭ ❤ w ✵ ❛ ✵ ✐ ✛ ✵ ✭ ❤ w ✵ ✐ ❀ x ✐ ✮ ✰ ✐ ❀ x ✐ ✮ ❤ ✁ w ✐ ❀ x ✐ ✰ ♦ ✭ ✁ ✮ ✿ ✐ ❂✶ ✐ ❂✶ ⑤ ④③ ⑥ ⑤ ④③ ⑥ Top layer linearization Bottom layer linearization Linearized neural networks ( W ❂ ✭ w ✐ ✮ ✐ ✷ ❬ ◆ ❪ ✘ ✐✐❞ ❯♥✐❢✭ S ❞ � ✶ ✮ ): ◆ ♥ ♦ ❳ ❋ RF ❀◆ ✭ W ✮ ❂ ❢ ❂ ❛ ✐ ✛ ✭ ❤ w ✐ ❀ x ✐ ✮ ✿ ❛ ✐ ✷ R ❀ ✐ ✷ ❬ ◆ ❪ ❀ ✐ ❂✶ ◆ ♥ ♦ ❳ ✛ ✵ ✭ ❤ w ✐ ❀ x ✐ ✮ ❤ b ✐ ❀ x ✐ ✿ b ✐ ✷ R ❞ ❀ ✐ ✷ ❬ ◆ ❪ ❋ NT ❀◆ ✭ W ✮ ❂ ❢ ❂ ✿ ✐ ❂✶ Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 5 / 15
Two-layers neural networks Neural networks: ◆ ♥ ❛ ✐ ✛ ✭ ❤ w ✐ ❀ x ✐ ✮ ✿ ❛ ✐ ✷ R ❀ w ✐ ✷ R ❞ ♦ ❳ ❋ NN ❀◆ ❂ ❢ ◆ ✭ x ❀ Θ ✮ ❂ ✿ ✐ ❂✶ Linearization: ◆ ◆ ❳ ❳ ❢ ◆ ✭ x ❀ Θ ✮ ❂ ❢ ◆ ✭ x ❀ Θ ✵ ✮ ✰ ✁ ❛ ✐ ✛ ✭ ❤ w ✵ ❛ ✵ ✐ ✛ ✵ ✭ ❤ w ✵ ✐ ❀ x ✐ ✮ ✰ ✐ ❀ x ✐ ✮ ❤ ✁ w ✐ ❀ x ✐ ✰ ♦ ✭ ✁ ✮ ✿ ✐ ❂✶ ✐ ❂✶ ⑤ ④③ ⑥ ⑤ ④③ ⑥ Top layer linearization Bottom layer linearization Linearized neural networks ( W ❂ ✭ w ✐ ✮ ✐ ✷ ❬ ◆ ❪ ✘ ✐✐❞ ❯♥✐❢✭ S ❞ � ✶ ✮ ): ◆ ♥ ♦ ❳ ❋ RF ❀◆ ✭ W ✮ ❂ ❢ ❂ ❛ ✐ ✛ ✭ ❤ w ✐ ❀ x ✐ ✮ ✿ ❛ ✐ ✷ R ❀ ✐ ✷ ❬ ◆ ❪ ❀ ✐ ❂✶ ◆ ♥ ♦ ❳ ✛ ✵ ✭ ❤ w ✐ ❀ x ✐ ✮ ❤ b ✐ ❀ x ✐ ✿ b ✐ ✷ R ❞ ❀ ✐ ✷ ❬ ◆ ❪ ❋ NT ❀◆ ✭ W ✮ ❂ ❢ ❂ ✿ ✐ ❂✶ Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 5 / 15

Recommend

More recommend

Explore More Topics

Stay informed with curated content and fresh updates.

animals pets art culture automotive transportation business finance computer internet construction architecture education-career electronics communication

Newsletter

Stay Informed & Inspired—Subscribe for the Latest Updates, Tips, and Exclusive Content Directly to Your Inbox.

Mail Us

mail@sambuz.com

2026 SAMBUZ, All right reserved.