The landscape of non-convex losses for statistical learning problems Song Mei Stanford University October 19, 2017 Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 1 / 32
Deep learning Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 2 / 32
Deep learning Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 2 / 32
Convolutional Neural Network Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 3 / 32
Non-convex optimization Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 4 / 32
Why does non-convex neural network perform well? Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 5 / 32
Why does some non-convex optimization perform well? Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 6 / 32
Why does some non-convex optimization perform well? ◮ Stochastic gradient descent escape bad local minima. ◮ Good initialization escape bad local minima. ◮ Globally there are less bad local minima. ◮ .... Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 6 / 32
Non-convex optimization: analysis of global geometry Number and locations of saddle points and local minima. Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 7 / 32
Let’s do it! The objective function ♥ ❳ ✶ ❢ ② ✐ � ✛ ✭ ❲ ❦ ✁ ✁ ✁ ✛ ✭ ❲ ✷ ✁ ✛ ✭ ❲ ✶ ① ✐ ✮✮✮ ❣ ✷ ♠✐♥ ♥ ❲ ✐ ✐ ❂✶ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 8 / 32
Let’s do it! The objective function ♥ ❳ ✶ ❢ ② ✐ � ✛ ✭ ❲ ✷ ✁ ✛ ✭ ❲ ✶ ① ✐ ✮✮ ❣ ✷ ♠✐♥ ♥ ❲ ✐ ✐ ❂✶ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 8 / 32
Let’s do it! The objective function ♥ ❳ ✶ ❢ ② ✐ � ✛ ✭ ❤ ✒❀ ① ✐ ✐ ✮ ❣ ✷ ♠✐♥ ♥ ✒ ✐ ❂✶ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 8 / 32
Binary linear classification The model ③ ✐ ❂ ✭ ① ✐ ❀ ② ✐ ✮ . ① ✐ ✷ R ❞ , ② ✐ ✷ ❢ ✵ ❀ ✶ ❣ . Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 9 / 32
One node neural network The model ③ ✐ ❂ ✭ ① ✐ ❀ ② ✐ ✮ . ① ✐ ✷ R ❞ , ② ✐ ✷ ❢ ✵ ❀ ✶ ❣ . ◮ Convex logit loss ( ❵ ❝ is cvx in ✒ ) ❵ ❝ ✭ ✒ ❀ ③ ✮ ❂ ② ❤ ①❀ ✒ ✐ � ❧♦❣ ❢ ✶ ✰ ❡①♣✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✿ ◮ Non-convex loss ( ❵ is not cvx in ✒ ) ❵ ✭ ✒ ❀ ③ ✮ ❂ ❢ ② � ✛ ✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✷ ❀ where ✛ ✭ t ✮ ❂ ✶ ❂ ✭✶ ✰ ❡①♣✭ t ✮✮ ✿ ◮ Empirical Risk ♥ ♥ ❳ ❳ ❘ ♥ ✭ ✒ ✮ ❂ ✶ ❵ ✭ ✒ ❀ ③ ✐ ✮ ❂ ✶ ❜ ❢ ② ✐ � ✛ ✭ ❤ ✒❀ ① ✐ ✐ ✮ ❣ ✷ ✿ ♥ ♥ ✐ ❂✶ ✐ ❂✶ ◮ Empirical risk minimizer ❫ ❜ ✒ ♥ ❂ ❛r❣ ♠✐♥ ❘ ♥ ✭ ✒ ✮ ✿ ✒ ✷ B ❞ ✭ ❘ ✮ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 10 / 32
One node neural network The model ③ ✐ ❂ ✭ ① ✐ ❀ ② ✐ ✮ . ① ✐ ✷ R ❞ , ② ✐ ✷ ❢ ✵ ❀ ✶ ❣ . ◮ Convex logit loss ( ❵ ❝ is cvx in ✒ ) ❵ ❝ ✭ ✒ ❀ ③ ✮ ❂ ② ❤ ①❀ ✒ ✐ � ❧♦❣ ❢ ✶ ✰ ❡①♣✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✿ ◮ Non-convex loss ( ❵ is not cvx in ✒ ) ❵ ✭ ✒ ❀ ③ ✮ ❂ ❢ ② � ✛ ✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✷ ❀ where ✛ ✭ t ✮ ❂ ✶ ❂ ✭✶ ✰ ❡①♣✭ t ✮✮ ✿ ◮ Empirical Risk ♥ ♥ ❳ ❳ ❘ ♥ ✭ ✒ ✮ ❂ ✶ ❵ ✭ ✒ ❀ ③ ✐ ✮ ❂ ✶ ❜ ❢ ② ✐ � ✛ ✭ ❤ ✒❀ ① ✐ ✐ ✮ ❣ ✷ ✿ ♥ ♥ ✐ ❂✶ ✐ ❂✶ ◮ Empirical risk minimizer ❫ ❜ ✒ ♥ ❂ ❛r❣ ♠✐♥ ❘ ♥ ✭ ✒ ✮ ✿ ✒ ✷ B ❞ ✭ ❘ ✮ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 10 / 32
One node neural network The model ③ ✐ ❂ ✭ ① ✐ ❀ ② ✐ ✮ . ① ✐ ✷ R ❞ , ② ✐ ✷ ❢ ✵ ❀ ✶ ❣ . ◮ Convex logit loss ( ❵ ❝ is cvx in ✒ ) ❵ ❝ ✭ ✒ ❀ ③ ✮ ❂ ② ❤ ①❀ ✒ ✐ � ❧♦❣ ❢ ✶ ✰ ❡①♣✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✿ ◮ Non-convex loss ( ❵ is not cvx in ✒ ) ❵ ✭ ✒ ❀ ③ ✮ ❂ ❢ ② � ✛ ✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✷ ❀ where ✛ ✭ t ✮ ❂ ✶ ❂ ✭✶ ✰ ❡①♣✭ t ✮✮ ✿ ◮ Empirical Risk ♥ ♥ ❳ ❳ ❘ ♥ ✭ ✒ ✮ ❂ ✶ ❵ ✭ ✒ ❀ ③ ✐ ✮ ❂ ✶ ❜ ❢ ② ✐ � ✛ ✭ ❤ ✒❀ ① ✐ ✐ ✮ ❣ ✷ ✿ ♥ ♥ ✐ ❂✶ ✐ ❂✶ ◮ Empirical risk minimizer ❫ ❜ ✒ ♥ ❂ ❛r❣ ♠✐♥ ❘ ♥ ✭ ✒ ✮ ✿ ✒ ✷ B ❞ ✭ ❘ ✮ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 10 / 32
One node neural network The model ③ ✐ ❂ ✭ ① ✐ ❀ ② ✐ ✮ . ① ✐ ✷ R ❞ , ② ✐ ✷ ❢ ✵ ❀ ✶ ❣ . ◮ Convex logit loss ( ❵ ❝ is cvx in ✒ ) ❵ ❝ ✭ ✒ ❀ ③ ✮ ❂ ② ❤ ①❀ ✒ ✐ � ❧♦❣ ❢ ✶ ✰ ❡①♣✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✿ ◮ Non-convex loss ( ❵ is not cvx in ✒ ) ❵ ✭ ✒ ❀ ③ ✮ ❂ ❢ ② � ✛ ✭ ❤ ①❀ ✒ ✐ ✮ ❣ ✷ ❀ where ✛ ✭ t ✮ ❂ ✶ ❂ ✭✶ ✰ ❡①♣✭ t ✮✮ ✿ ◮ Empirical Risk ♥ ♥ ❳ ❳ ❘ ♥ ✭ ✒ ✮ ❂ ✶ ❵ ✭ ✒ ❀ ③ ✐ ✮ ❂ ✶ ❜ ❢ ② ✐ � ✛ ✭ ❤ ✒❀ ① ✐ ✐ ✮ ❣ ✷ ✿ ♥ ♥ ✐ ❂✶ ✐ ❂✶ ◮ Empirical risk minimizer ❫ ❜ ✒ ♥ ❂ ❛r❣ ♠✐♥ ❘ ♥ ✭ ✒ ✮ ✿ ✒ ✷ B ❞ ✭ ❘ ✮ Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 10 / 32
❜ ❘ ♥ ✭ ✒ ✮ A negative theoretical result Theorem (Auer et. al. . 1996) For the one node neural network, ✽ ♥❀ ❞ ❃ ✵ , there exists a dataset ❞ ❝ ❞ distinct local ✭ ① ✐ ❀ ② ✐ ✮ ♥ ✐ ❂✶ such that the empirical risk ❜ ❘ ♥ ✭ ✒ ✮ has ❜ ♥ minima. Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 11 / 32
A negative theoretical result Theorem (Auer et. al. . 1996) For the one node neural network, ✽ ♥❀ ❞ ❃ ✵ , there exists a dataset ❞ ❝ ❞ distinct local ✭ ① ✐ ❀ ② ✐ ✮ ♥ ✐ ❂✶ such that the empirical risk ❜ ❘ ♥ ✭ ✒ ✮ has ❜ ♥ minima. The landscape of ❜ ❘ ♥ ✭ ✒ ✮ is very rough. Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 11 / 32
A negative theoretical result Theorem (Auer et. al. . 1996) For the one node neural network, ✽ ♥❀ ❞ ❃ ✵ , there exists a dataset ❞ ❝ ❞ distinct local ✭ ① ✐ ❀ ② ✐ ✮ ♥ ✐ ❂✶ such that the empirical risk ❜ ❘ ♥ ✭ ✒ ✮ has ❜ ♥ minima. The landscape of ❜ ❘ ♥ ✭ ✒ ✮ is very rough. Is this the end of the world of deep learning? Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 11 / 32
Real data experiment ◮ The "Australian" data set from Statlog: ❞ ❂ ✶✶ , ♥ ❂ ✻✽✸ . ◮ Random initialization ✒ ✭✵✮ ✘ ◆ ✭ 0 ❀ ■ ❞ ✮ . ◮ Run gradient descent and track the path ✒ ✭ ❦ ✮ . ◮ Generate multiple paths with independent initializations. ◮ Plot standard deviation over paths st❞✭ ✒ ✭ ❦ ✮✮ versus ❦ . Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 12 / 32
Real data experiment ◮ The "Australian" data set from Statlog: ❞ ❂ ✶✶ , ♥ ❂ ✻✽✸ . ◮ Random initialization ✒ ✭✵✮ ✘ ◆ ✭ 0 ❀ ■ ❞ ✮ . ◮ Run gradient descent and track the path ✒ ✭ ❦ ✮ . ◮ Generate multiple paths with independent initializations. ◮ Plot standard deviation over paths st❞✭ ✒ ✭ ❦ ✮✮ versus ❦ . Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 12 / 32
Real data experiment ◮ The "Australian" data set from Statlog: ❞ ❂ ✶✶ , ♥ ❂ ✻✽✸ . ◮ Random initialization ✒ ✭✵✮ ✘ ◆ ✭ 0 ❀ ■ ❞ ✮ . ◮ Run gradient descent and track the path ✒ ✭ ❦ ✮ . ◮ Generate multiple paths with independent initializations. ◮ Plot standard deviation over paths st❞✭ ✒ ✭ ❦ ✮✮ versus ❦ . Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 12 / 32
Real data experiment ◮ The "Australian" data set from Statlog: ❞ ❂ ✶✶ , ♥ ❂ ✻✽✸ . ◮ Random initialization ✒ ✭✵✮ ✘ ◆ ✭ 0 ❀ ■ ❞ ✮ . ◮ Run gradient descent and track the path ✒ ✭ ❦ ✮ . ◮ Generate multiple paths with independent initializations. ◮ Plot standard deviation over paths st❞✭ ✒ ✭ ❦ ✮✮ versus ❦ . Song Mei (Stanford University) The landscape of non-convex optimization October 19, 2017 12 / 32
Recommend
More recommend