escaping saddle points with adaptive gradient methods
play

Escaping Saddle Points with Adaptive Gradient Methods Matthew Staib - PowerPoint PPT Presentation

Escaping Saddle Points with Adaptive Gradient Methods Matthew Staib 1 , Sashank Reddi 2 , Satyen Kale 2 , Sanjiv Kumar 2 , Suvrit Sra 1 1. MIT EECS 2. Google Research, New York Escaping Saddle Points with Adaptive Gradient Methods Matthew


  1. Escaping Saddle Points with 
 Adaptive Gradient Methods Matthew Staib 1 , Sashank Reddi 2 , Satyen Kale 2 , Sanjiv Kumar 2 , Suvrit Sra 1 1. MIT EECS 2. Google Research, New York

  2. Escaping Saddle Points with Adaptive Gradient Methods — Matthew Staib Adam, RMSProp and friends

  3. Escaping Saddle Points with Adaptive Gradient Methods — Matthew Staib Adam, RMSProp and friends • Empirically: good non-convex performance

  4. Escaping Saddle Points with Adaptive Gradient Methods — Matthew Staib Adam, RMSProp and friends • Empirically: good non-convex performance • Limited theory, some non-convergence results [e.g. Reddi et al. ‘18]

  5. Escaping Saddle Points with Adaptive Gradient Methods — Matthew Staib Adam, RMSProp and friends • Empirically: good non-convex performance • Limited theory, some non-convergence results [e.g. Reddi et al. ‘18] • Our take: adaptive methods escape saddles (in words: via isotropic noise), reach SOSPs

  6. Escaping Saddle Points with Adaptive Gradient Methods — Matthew Staib Adam, RMSProp and friends • Empirically: good non-convex performance • Limited theory, some non-convergence results [e.g. Reddi et al. ‘18] • Our take: adaptive methods escape saddles (in words: via isotropic noise), reach SOSPs This paper: The first second -order rates for adaptive methods

  7. <latexit sha1_base64="jiG38X1I4R8CDVse1OhQyU5neE=">AFVXicdZTNbhMxEMfdkIJH23h2MuqVaVKiCrbHuCVNQWOBRqn5J2Sjy7k4SU6+9smdJImvPA1XeBPEjVfgAZCYTdIm6SWdv3fmd/Y47HXYSqFxVrt98LivcrS/QfLD6uPHj95urK69uzC6sxEcB5pqc1VyC1IoeAcBUq4Sg3wJRwGV4fFP7Lr2Cs0OoM+yk0Et5WoiUijmRqrm70mg5f+LkXSGghN0Z3vV4TvZdeAMi9Nsnm6mZtpzZo3qzwR2Jz39/+4cxdtJcq6wHsY6yBRGkltb92spNhw3KCIJeTXILKQ8uZtqJNUPAHbcIPF5N4WKvpQ09Cr2BdTLC8cTafhISmXDs2GlfYZznq2fYet1wQqUZgoqGE7Uy6aH2isp4sTAQoeyT4JERlKsXdbjhEVL9SrP0hqmWZ5ZtTUGdZMacWsioHDouVn4IVBEDHym7TykYjtq4Ch3QZFvGLqjPK9WAwXdSCcJV7ELlBa0vSEhKsvLrCIWpLnECTyfE5cVsS952Si9XoGUskjsN5Bc29qJNHWCog9pqMT8yYdiK0J2p+hsavH9G6Z3q1O78mgLIMxsAPaQOIkFDm540FXt1R4Oo6NEhJpoyUVq5+7g1t5F6q+kD0zULC3ej6cGp1qKwpH7k4mPibwEWuxL8HF0BJqiJQGvW5OxuJknciyh2O9fycoMeTlH4NdzQS8zH6l7IkHY75dqznw/Rwc52702E/H2rR8c7du+J9Z2ktGi4UDkp7o8dwebPboBNA2qhqcCMdQg+7IsbOmz2hcrpO/OnLY1Zc7O74tR3/s7+5f8qGbZmtsw2zXz2iu2zD+yEnbOIfWPf2Q/2s/Kr8m+JbskhurgwinOSm1p5T/zs+sa</latexit> <latexit sha1_base64="3mLW/XsJ4MDq1tCKEDyxz9rYh5Y=">AFVXicdZTdbtMwFMc96GCUj21wuQuiTZMmIaZmu4AbpKFtgNAQY9qXtFSVk5y2Zo4d2Se0lZV7noYrJHgTxD0vwAMgcdJ2a9N2lhL/c87v2MfHjsNUCou12u+5W7cr83fuLtyr3n/w8NHi0vLjU6szE8FJpKU25yG3IWCExQo4Tw1wJNQwl4uVv4z76AsUKrY+ylUE94S4miDiSqbG02m04fObnXiChidwY3fG6DfSewEg91okG0trtc1av3nTwh+KtR1/4+f7+fHjaWKytBrKMsAYWR5NZe+LU64bFJGEvBpkFlIeXfIWXJBUPAFbd/3F5N46WKvqQ09Cr2+dTzC8cTaXhISmXBs20lfYZzlu8iw+bLuhEozBUNJmpm0kPtFZXxYmEgQtkjwSMjKFcvanPDI6T6lWbpDlItzyxbmoLayZQ5tZBROXRcrHwPqCIGPlB2H1MwHLVxwX7ugiLfMHT7eV6tBgo6kU4SrmIXKC1oe0NCVJaXWARtSTPHkjk+Yy4rIh7y8lE6/UMpJHYL3dxvbESKlFRB7QEcn5g06EOtjtD9FY0eP6K0yvVWd3JN+WfpjYBu0gcRJKHJyB/3uwlLh6TjWS0ikjZUrF7udq/lTaj6TPbMQMFe69lwanSqrSgcuTsc+xjDh6zFngQXQ1OoAVIaNjn7ngoSt6xKLc30rNzgi5PUvo13P5QzMboX8qSdDm65GeDdPDzWXujgb9bKhJxzt3b4r3jaW1aLhQ2C/tlR7B5c1ugU4AaOqwZV0CF3siBjbr7aFyuk68Scvj2lxurXp1zb9T/7azhEbtAW2wlbZBvPZC7bD3rFDdsIi9pV9Yz/Yz8qvyr95uiUH6K25YcwTVmrzi/8BLUrsOw=</latexit> <latexit sha1_base64="3mLW/XsJ4MDq1tCKEDyxz9rYh5Y=">AFVXicdZTdbtMwFMc96GCUj21wuQuiTZMmIaZmu4AbpKFtgNAQY9qXtFSVk5y2Zo4d2Se0lZV7noYrJHgTxD0vwAMgcdJ2a9N2lhL/c87v2MfHjsNUCou12u+5W7cr83fuLtyr3n/w8NHi0vLjU6szE8FJpKU25yG3IWCExQo4Tw1wJNQwl4uVv4z76AsUKrY+ylUE94S4miDiSqbG02m04fObnXiChidwY3fG6DfSewEg91okG0trtc1av3nTwh+KtR1/4+f7+fHjaWKytBrKMsAYWR5NZe+LU64bFJGEvBpkFlIeXfIWXJBUPAFbd/3F5N46WKvqQ09Cr2+dTzC8cTaXhISmXBs20lfYZzlu8iw+bLuhEozBUNJmpm0kPtFZXxYmEgQtkjwSMjKFcvanPDI6T6lWbpDlItzyxbmoLayZQ5tZBROXRcrHwPqCIGPlB2H1MwHLVxwX7ugiLfMHT7eV6tBgo6kU4SrmIXKC1oe0NCVJaXWARtSTPHkjk+Yy4rIh7y8lE6/UMpJHYL3dxvbESKlFRB7QEcn5g06EOtjtD9FY0eP6K0yvVWd3JN+WfpjYBu0gcRJKHJyB/3uwlLh6TjWS0ikjZUrF7udq/lTaj6TPbMQMFe69lwanSqrSgcuTsc+xjDh6zFngQXQ1OoAVIaNjn7ngoSt6xKLc30rNzgi5PUvo13P5QzMboX8qSdDm65GeDdPDzWXujgb9bKhJxzt3b4r3jaW1aLhQ2C/tlR7B5c1ugU4AaOqwZV0CF3siBjbr7aFyuk68Scvj2lxurXp1zb9T/7azhEbtAW2wlbZBvPZC7bD3rFDdsIi9pV9Yz/Yz8qvyr95uiUH6K25YcwTVmrzi/8BLUrsOw=</latexit> <latexit sha1_base64="1GK9+Wvf7hqJHTCGvdPZpek+u0s=">AFVXicdZTfb9MwEMcz6GCUH9vgcS/RpklIiKnZHuAFaWgb8DEmPZLaqrKca6tmWNH9oW2svLOX8Mr/CeIPwaJS5utTdtZSvzN3efs89lxlEphsdH4u3Tvfm35wcOVR/XHT54+W1bf35hdWY4nHMtbmKmAUpFJyjQAlXqQGWRBIuo+uDwn/5HYwVWp3hMIVWwrpKdARnSKb2uag7fBVkPuhA4yY3TfH7TRf+2HgMzvkmyvbTV2GqPmz4ugFte2U7a67WNMNY8S0Ahl8zaZtBIseWYQcEl5PUws5Ayfs260CSpWAK25UaLyf1tsR+Rxt6FPoj63SEY4m1wyQiMmHYs7O+wrjI18yw87blhEozBMXHE3Uy6aP2i8r4sTDAUQ5JMG4E5erzHjOMI9WvMstgnGp1ZtnVFNRL5syphYzKoeNi5YdAFTHwmbL7koJhqI0Lj3IXFvlGkTvK83o9VNDnOkmYil2otKDtjQhRWV51gUXUkjyHIJHlC+KyIu4jIxOt1zeQSsbB+gftvZmRFcrIPaYjk7M2nQgtqfoYI7Gvp7Qu1V6tz67J6OyjMbAHmgDiZNQ5OSOR13TUuHpOLYqCNdGSyrWMHcHt/IuVH0je2agYG/1Yjg1OtVWFI7cnUx9TOEla3EowcXQEWqMVAYq+9ydlaLinYpyhxO9OCcYsCSlX8MdlWIxRv9SlqTjMd9P9GKYHmauc3c67hdDHTreuftQvO8srUXDhMJRaW/0BK5udhd0AkgbVQ9vpEMYF/E2Hu3J1RO10kwe3nMi4vdnaCxE3wNtvZPy4tlxdvwNr2XuC98fa9T96Jd+5x74f30/vl/a79qf1bpltyjN5bKmNeJW2vPofwSbosA=</latexit> Escaping Saddle Points with Adaptive Gradient Methods — Matthew Staib x t +1 ← x t − η g t

  8. <latexit sha1_base64="jiG38X1I4R8CDVse1OhQyU5neE=">AFVXicdZTNbhMxEMfdkIJH23h2MuqVaVKiCrbHuCVNQWOBRqn5J2Sjy7k4SU6+9smdJImvPA1XeBPEjVfgAZCYTdIm6SWdv3fmd/Y47HXYSqFxVrt98LivcrS/QfLD6uPHj95urK69uzC6sxEcB5pqc1VyC1IoeAcBUq4Sg3wJRwGV4fFP7Lr2Cs0OoM+yk0Et5WoiUijmRqrm70mg5f+LkXSGghN0Z3vV4TvZdeAMi9Nsnm6mZtpzZo3qzwR2Jz39/+4cxdtJcq6wHsY6yBRGkltb92spNhw3KCIJeTXILKQ8uZtqJNUPAHbcIPF5N4WKvpQ09Cr2BdTLC8cTafhISmXDs2GlfYZznq2fYet1wQqUZgoqGE7Uy6aH2isp4sTAQoeyT4JERlKsXdbjhEVL9SrP0hqmWZ5ZtTUGdZMacWsioHDouVn4IVBEDHym7TykYjtq4Ch3QZFvGLqjPK9WAwXdSCcJV7ELlBa0vSEhKsvLrCIWpLnECTyfE5cVsS952Si9XoGUskjsN5Bc29qJNHWCog9pqMT8yYdiK0J2p+hsavH9G6Z3q1O78mgLIMxsAPaQOIkFDm540FXt1R4Oo6NEhJpoyUVq5+7g1t5F6q+kD0zULC3ej6cGp1qKwpH7k4mPibwEWuxL8HF0BJqiJQGvW5OxuJknciyh2O9fycoMeTlH4NdzQS8zH6l7IkHY75dqznw/Rwc52702E/H2rR8c7du+J9Z2ktGi4UDkp7o8dwebPboBNA2qhqcCMdQg+7IsbOmz2hcrpO/OnLY1Zc7O74tR3/s7+5f8qGbZmtsw2zXz2iu2zD+yEnbOIfWPf2Q/2s/Kr8m+JbskhurgwinOSm1p5T/zs+sa</latexit> <latexit sha1_base64="3mLW/XsJ4MDq1tCKEDyxz9rYh5Y=">AFVXicdZTdbtMwFMc96GCUj21wuQuiTZMmIaZmu4AbpKFtgNAQY9qXtFSVk5y2Zo4d2Se0lZV7noYrJHgTxD0vwAMgcdJ2a9N2lhL/c87v2MfHjsNUCou12u+5W7cr83fuLtyr3n/w8NHi0vLjU6szE8FJpKU25yG3IWCExQo4Tw1wJNQwl4uVv4z76AsUKrY+ylUE94S4miDiSqbG02m04fObnXiChidwY3fG6DfSewEg91okG0trtc1av3nTwh+KtR1/4+f7+fHjaWKytBrKMsAYWR5NZe+LU64bFJGEvBpkFlIeXfIWXJBUPAFbd/3F5N46WKvqQ09Cr2+dTzC8cTaXhISmXBs20lfYZzlu8iw+bLuhEozBUNJmpm0kPtFZXxYmEgQtkjwSMjKFcvanPDI6T6lWbpDlItzyxbmoLayZQ5tZBROXRcrHwPqCIGPlB2H1MwHLVxwX7ugiLfMHT7eV6tBgo6kU4SrmIXKC1oe0NCVJaXWARtSTPHkjk+Yy4rIh7y8lE6/UMpJHYL3dxvbESKlFRB7QEcn5g06EOtjtD9FY0eP6K0yvVWd3JN+WfpjYBu0gcRJKHJyB/3uwlLh6TjWS0ikjZUrF7udq/lTaj6TPbMQMFe69lwanSqrSgcuTsc+xjDh6zFngQXQ1OoAVIaNjn7ngoSt6xKLc30rNzgi5PUvo13P5QzMboX8qSdDm65GeDdPDzWXujgb9bKhJxzt3b4r3jaW1aLhQ2C/tlR7B5c1ugU4AaOqwZV0CF3siBjbr7aFyuk68Scvj2lxurXp1zb9T/7azhEbtAW2wlbZBvPZC7bD3rFDdsIi9pV9Yz/Yz8qvyr95uiUH6K25YcwTVmrzi/8BLUrsOw=</latexit> <latexit sha1_base64="3mLW/XsJ4MDq1tCKEDyxz9rYh5Y=">AFVXicdZTdbtMwFMc96GCUj21wuQuiTZMmIaZmu4AbpKFtgNAQY9qXtFSVk5y2Zo4d2Se0lZV7noYrJHgTxD0vwAMgcdJ2a9N2lhL/c87v2MfHjsNUCou12u+5W7cr83fuLtyr3n/w8NHi0vLjU6szE8FJpKU25yG3IWCExQo4Tw1wJNQwl4uVv4z76AsUKrY+ylUE94S4miDiSqbG02m04fObnXiChidwY3fG6DfSewEg91okG0trtc1av3nTwh+KtR1/4+f7+fHjaWKytBrKMsAYWR5NZe+LU64bFJGEvBpkFlIeXfIWXJBUPAFbd/3F5N46WKvqQ09Cr2+dTzC8cTaXhISmXBs20lfYZzlu8iw+bLuhEozBUNJmpm0kPtFZXxYmEgQtkjwSMjKFcvanPDI6T6lWbpDlItzyxbmoLayZQ5tZBROXRcrHwPqCIGPlB2H1MwHLVxwX7ugiLfMHT7eV6tBgo6kU4SrmIXKC1oe0NCVJaXWARtSTPHkjk+Yy4rIh7y8lE6/UMpJHYL3dxvbESKlFRB7QEcn5g06EOtjtD9FY0eP6K0yvVWd3JN+WfpjYBu0gcRJKHJyB/3uwlLh6TjWS0ikjZUrF7udq/lTaj6TPbMQMFe69lwanSqrSgcuTsc+xjDh6zFngQXQ1OoAVIaNjn7ngoSt6xKLc30rNzgi5PUvo13P5QzMboX8qSdDm65GeDdPDzWXujgb9bKhJxzt3b4r3jaW1aLhQ2C/tlR7B5c1ugU4AaOqwZV0CF3siBjbr7aFyuk68Scvj2lxurXp1zb9T/7azhEbtAW2wlbZBvPZC7bD3rFDdsIi9pV9Yz/Yz8qvyr95uiUH6K25YcwTVmrzi/8BLUrsOw=</latexit> <latexit sha1_base64="1GK9+Wvf7hqJHTCGvdPZpek+u0s=">AFVXicdZTfb9MwEMcz6GCUH9vgcS/RpklIiKnZHuAFaWgb8DEmPZLaqrKca6tmWNH9oW2svLOX8Mr/CeIPwaJS5utTdtZSvzN3efs89lxlEphsdH4u3Tvfm35wcOVR/XHT54+W1bf35hdWY4nHMtbmKmAUpFJyjQAlXqQGWRBIuo+uDwn/5HYwVWp3hMIVWwrpKdARnSKb2uag7fBVkPuhA4yY3TfH7TRf+2HgMzvkmyvbTV2GqPmz4ugFte2U7a67WNMNY8S0Ahl8zaZtBIseWYQcEl5PUws5Ayfs260CSpWAK25UaLyf1tsR+Rxt6FPoj63SEY4m1wyQiMmHYs7O+wrjI18yw87blhEozBMXHE3Uy6aP2i8r4sTDAUQ5JMG4E5erzHjOMI9WvMstgnGp1ZtnVFNRL5syphYzKoeNi5YdAFTHwmbL7koJhqI0Lj3IXFvlGkTvK83o9VNDnOkmYil2otKDtjQhRWV51gUXUkjyHIJHlC+KyIu4jIxOt1zeQSsbB+gftvZmRFcrIPaYjk7M2nQgtqfoYI7Gvp7Qu1V6tz67J6OyjMbAHmgDiZNQ5OSOR13TUuHpOLYqCNdGSyrWMHcHt/IuVH0je2agYG/1Yjg1OtVWFI7cnUx9TOEla3EowcXQEWqMVAYq+9ydlaLinYpyhxO9OCcYsCSlX8MdlWIxRv9SlqTjMd9P9GKYHmauc3c67hdDHTreuftQvO8srUXDhMJRaW/0BK5udhd0AkgbVQ9vpEMYF/E2Hu3J1RO10kwe3nMi4vdnaCxE3wNtvZPy4tlxdvwNr2XuC98fa9T96Jd+5x74f30/vl/a79qf1bpltyjN5bKmNeJW2vPofwSbosA=</latexit> Escaping Saddle Points with Adaptive Gradient Methods — Matthew Staib x t +1 ← x t − η g t

Recommend


More recommend