the landscape of non convex losses for statistical
play

The landscape of non-convex losses for statistical learning problems - PowerPoint PPT Presentation

The landscape of non-convex losses for statistical learning problems Song Mei Stanford University October 19, 2017 Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 1 / 32 Deep learning Song Mei


  1. The landscape of non-convex losses for statistical learning problems Song Mei Stanford University October 19, 2017 Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 1 / 32

  2. Deep learning Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 2 / 32

  3. Deep learning Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 2 / 32

  4. Convolutional Neural Network Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 3 / 32

  5. Non-convex optimization Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 4 / 32

  6. Why does non-convex neural network perform well? Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 5 / 32

  7. Why does some non-convex optimization perform well? Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 6 / 32

  8. Why does some non-convex optimization perform well? ◮ Stochastic gradient descent escape bad local minima. ◮ Good initialization escape bad local minima. ◮ Globally there are less bad local minima. ◮ .... Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 6 / 32

  9. Non-convex optimization: analysis of global geometry Number and locations of saddle points and local minima. Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 7 / 32

  10. Let’s do it! The objective function ♥ ❳ ✶ ❢ ② ✐ � ✛ ✭ ❲ ❦ ✁ ✁ ✁ ✛ ✭ ❲ ✷ ✁ ✛ ✭ ❲ ✶ ① ✐ ✮✮✮ ❣ ✷ ♠✐♥ ♥ ❲ ✐ ✐ ❂✶ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 8 / 32

  11. Let’s do it! The objective function ♥ ❳ ✶ ❢ ② ✐ � ✛ ✭ ❲ ✷ ✁ ✛ ✭ ❲ ✶ ① ✐ ✮✮ ❣ ✷ ♠✐♥ ♥ ❲ ✐ ✐ ❂✶ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 8 / 32

  12. Let’s do it! The objective function ♥ ❳ ✶ ❢ ② ✐ � ✛ ✭ ❤ ✒❀ ① ✐ ✐ ✮ ❣ ✷ ♠✐♥ ♥ ✒ ✐ ❂✶ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 8 / 32

  13. Binary linear classification The model ③ ✐ ❂ ✭ ① ✐ ❀ ② ✐ ✮ . ① ✐ ✷ R ❞ , ② ✐ ✷ ❢ ✵ ❀ ✶ ❣ . Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 9 / 32

  14. One node neural network The model ③ ✐ ❂ ✭ ① ✐ ❀ ② ✐ ✮ . ① ✐ ✷ R ❞ , ② ✐ ✷ ❢ ✵ ❀ ✶ ❣ . ◮ Convex logit loss ( ❵ ❝ is cvx in ✒ ) ❵ ❝ ✭ ✒ ❀ ③ ✮ ❂ ② ❤ ①❀ ✒ ✐ � ❧♦❣ ❢ ✶ ✰ ❡①♣✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✿ ◮ Non-convex loss ( ❵ is not cvx in ✒ ) ❵ ✭ ✒ ❀ ③ ✮ ❂ ❢ ② � ✛ ✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✷ ❀ where ✛ ✭ t ✮ ❂ ✶ ❂ ✭✶ ✰ ❡①♣✭ t ✮✮ ✿ ◮ Empirical Risk ♥ ♥ ❳ ❳ ❘ ♥ ✭ ✒ ✮ ❂ ✶ ❵ ✭ ✒ ❀ ③ ✐ ✮ ❂ ✶ ❜ ❢ ② ✐ � ✛ ✭ ❤ ✒❀ ① ✐ ✐ ✮ ❣ ✷ ✿ ♥ ♥ ✐ ❂✶ ✐ ❂✶ ◮ Empirical risk minimizer ❫ ❜ ✒ ♥ ❂ ❛r❣ ♠✐♥ ❘ ♥ ✭ ✒ ✮ ✿ ✒ ✷ B ❞ ✭ ❘ ✮ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 10 / 32

  15. One node neural network The model ③ ✐ ❂ ✭ ① ✐ ❀ ② ✐ ✮ . ① ✐ ✷ R ❞ , ② ✐ ✷ ❢ ✵ ❀ ✶ ❣ . ◮ Convex logit loss ( ❵ ❝ is cvx in ✒ ) ❵ ❝ ✭ ✒ ❀ ③ ✮ ❂ ② ❤ ①❀ ✒ ✐ � ❧♦❣ ❢ ✶ ✰ ❡①♣✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✿ ◮ Non-convex loss ( ❵ is not cvx in ✒ ) ❵ ✭ ✒ ❀ ③ ✮ ❂ ❢ ② � ✛ ✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✷ ❀ where ✛ ✭ t ✮ ❂ ✶ ❂ ✭✶ ✰ ❡①♣✭ t ✮✮ ✿ ◮ Empirical Risk ♥ ♥ ❳ ❳ ❘ ♥ ✭ ✒ ✮ ❂ ✶ ❵ ✭ ✒ ❀ ③ ✐ ✮ ❂ ✶ ❜ ❢ ② ✐ � ✛ ✭ ❤ ✒❀ ① ✐ ✐ ✮ ❣ ✷ ✿ ♥ ♥ ✐ ❂✶ ✐ ❂✶ ◮ Empirical risk minimizer ❫ ❜ ✒ ♥ ❂ ❛r❣ ♠✐♥ ❘ ♥ ✭ ✒ ✮ ✿ ✒ ✷ B ❞ ✭ ❘ ✮ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 10 / 32

  16. One node neural network The model ③ ✐ ❂ ✭ ① ✐ ❀ ② ✐ ✮ . ① ✐ ✷ R ❞ , ② ✐ ✷ ❢ ✵ ❀ ✶ ❣ . ◮ Convex logit loss ( ❵ ❝ is cvx in ✒ ) ❵ ❝ ✭ ✒ ❀ ③ ✮ ❂ ② ❤ ①❀ ✒ ✐ � ❧♦❣ ❢ ✶ ✰ ❡①♣✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✿ ◮ Non-convex loss ( ❵ is not cvx in ✒ ) ❵ ✭ ✒ ❀ ③ ✮ ❂ ❢ ② � ✛ ✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✷ ❀ where ✛ ✭ t ✮ ❂ ✶ ❂ ✭✶ ✰ ❡①♣✭ t ✮✮ ✿ ◮ Empirical Risk ♥ ♥ ❳ ❳ ❘ ♥ ✭ ✒ ✮ ❂ ✶ ❵ ✭ ✒ ❀ ③ ✐ ✮ ❂ ✶ ❜ ❢ ② ✐ � ✛ ✭ ❤ ✒❀ ① ✐ ✐ ✮ ❣ ✷ ✿ ♥ ♥ ✐ ❂✶ ✐ ❂✶ ◮ Empirical risk minimizer ❫ ❜ ✒ ♥ ❂ ❛r❣ ♠✐♥ ❘ ♥ ✭ ✒ ✮ ✿ ✒ ✷ B ❞ ✭ ❘ ✮ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 10 / 32

  17. One node neural network The model ③ ✐ ❂ ✭ ① ✐ ❀ ② ✐ ✮ . ① ✐ ✷ R ❞ , ② ✐ ✷ ❢ ✵ ❀ ✶ ❣ . ◮ Convex logit loss ( ❵ ❝ is cvx in ✒ ) ❵ ❝ ✭ ✒ ❀ ③ ✮ ❂ ② ❤ ①❀ ✒ ✐ � ❧♦❣ ❢ ✶ ✰ ❡①♣✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✿ ◮ Non-convex loss ( ❵ is not cvx in ✒ ) ❵ ✭ ✒ ❀ ③ ✮ ❂ ❢ ② � ✛ ✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✷ ❀ where ✛ ✭ t ✮ ❂ ✶ ❂ ✭✶ ✰ ❡①♣✭ t ✮✮ ✿ ◮ Empirical Risk ♥ ♥ ❳ ❳ ❘ ♥ ✭ ✒ ✮ ❂ ✶ ❵ ✭ ✒ ❀ ③ ✐ ✮ ❂ ✶ ❜ ❢ ② ✐ � ✛ ✭ ❤ ✒❀ ① ✐ ✐ ✮ ❣ ✷ ✿ ♥ ♥ ✐ ❂✶ ✐ ❂✶ ◮ Empirical risk minimizer ❫ ❜ ✒ ♥ ❂ ❛r❣ ♠✐♥ ❘ ♥ ✭ ✒ ✮ ✿ ✒ ✷ B ❞ ✭ ❘ ✮ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 10 / 32

  18. ❜ ❘ ♥ ✭ ✒ ✮ A negative theoretical result Theorem (Auer et. al. . 1996) For the one node neural network, ✽ ♥❀ ❞ ❃ ✵ , there exists a dataset ❞ ❝ ❞ distinct local ✭ ① ✐ ❀ ② ✐ ✮ ♥ ✐ ❂✶ such that the empirical risk ❜ ❘ ♥ ✭ ✒ ✮ has ❜ ♥ minima. Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 11 / 32

  19. A negative theoretical result Theorem (Auer et. al. . 1996) For the one node neural network, ✽ ♥❀ ❞ ❃ ✵ , there exists a dataset ❞ ❝ ❞ distinct local ✭ ① ✐ ❀ ② ✐ ✮ ♥ ✐ ❂✶ such that the empirical risk ❜ ❘ ♥ ✭ ✒ ✮ has ❜ ♥ minima. The landscape of ❜ ❘ ♥ ✭ ✒ ✮ is very rough. Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 11 / 32

  20. A negative theoretical result Theorem (Auer et. al. . 1996) For the one node neural network, ✽ ♥❀ ❞ ❃ ✵ , there exists a dataset ❞ ❝ ❞ distinct local ✭ ① ✐ ❀ ② ✐ ✮ ♥ ✐ ❂✶ such that the empirical risk ❜ ❘ ♥ ✭ ✒ ✮ has ❜ ♥ minima. The landscape of ❜ ❘ ♥ ✭ ✒ ✮ is very rough. Is this the end of the world of deep learning? Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 11 / 32

  21. Real data experiment ◮ The "Australian" data set from Statlog: ❞ ❂ ✶✶ , ♥ ❂ ✻✽✸ . ◮ Random initialization ✒ ✭✵✮ ✘ ◆ ✭ 0 ❀ ■ ❞ ✮ . ◮ Run gradient descent and track the path ✒ ✭ ❦ ✮ . ◮ Generate multiple paths with independent initializations. ◮ Plot standard deviation over paths st❞✭ ✒ ✭ ❦ ✮✮ versus ❦ . Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 12 / 32

  22. Real data experiment ◮ The "Australian" data set from Statlog: ❞ ❂ ✶✶ , ♥ ❂ ✻✽✸ . ◮ Random initialization ✒ ✭✵✮ ✘ ◆ ✭ 0 ❀ ■ ❞ ✮ . ◮ Run gradient descent and track the path ✒ ✭ ❦ ✮ . ◮ Generate multiple paths with independent initializations. ◮ Plot standard deviation over paths st❞✭ ✒ ✭ ❦ ✮✮ versus ❦ . Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 12 / 32

  23. Real data experiment ◮ The "Australian" data set from Statlog: ❞ ❂ ✶✶ , ♥ ❂ ✻✽✸ . ◮ Random initialization ✒ ✭✵✮ ✘ ◆ ✭ 0 ❀ ■ ❞ ✮ . ◮ Run gradient descent and track the path ✒ ✭ ❦ ✮ . ◮ Generate multiple paths with independent initializations. ◮ Plot standard deviation over paths st❞✭ ✒ ✭ ❦ ✮✮ versus ❦ . Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 12 / 32

  24. Real data experiment ◮ The "Australian" data set from Statlog: ❞ ❂ ✶✶ , ♥ ❂ ✻✽✸ . ◮ Random initialization ✒ ✭✵✮ ✘ ◆ ✭ 0 ❀ ■ ❞ ✮ . ◮ Run gradient descent and track the path ✒ ✭ ❦ ✮ . ◮ Generate multiple paths with independent initializations. ◮ Plot standard deviation over paths st❞✭ ✒ ✭ ❦ ✮✮ versus ❦ . Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 12 / 32

Recommend


More recommend