Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator Alp Yurtsever alp.yurtsever@epfl.ch joint work with Suvrit Sra & Volkan Cevher EPFL MIT ICML2019 - Long Beach Massachusetts Institute of Technology ( MIT ) Ecole Polytechnique Fédérale de Lausanne ( EPFL )
<latexit sha1_base64="3gBcBo+cM9T3mVcxfXZMDy1T/mU=">ACGHicbVBNS8NAEN34WeNX1aOXxSLopSYq6FH04rGC1UJTymY7sUs3m7A7kZYQ/4UX/4oXD4p49ea/cVt70NYHA4/3ZpiZF6ZSGPS8L2dmdm5+YbG05C6vrK6tlzc2b0ySaQ51nshEN0JmQAoFdRQoZFqYHEo4TbsXQz923vQRiTqGgcptGJ2p0QkOEMrtcsHQaY61gfM+zQigYxwy5nMm8UR4g9DGPhSoK98GN9vr7XLFq3oj0Gnij0mFjFrlz+DTsKzGBRyYxp+l6KrZxpFxC4QaZgZTxHruDpqWKxWBa+eixgu5apUOjRNtSEfq74mcxcYM4tB2Ds82k95Q/M9rZhidtnKh0gxB8Z9FUSYpJnSYEu0IDRzlwBLGtbC3Ut5lmnG0Wbk2BH/y5Wlyc1j1j6qHV8eVs/NxHCWyTXbIHvHJCTkjl6RG6oSTR/JMXsmb8+S8O/Ox0/rjDOe2SJ/4Hx+A7peoMo=</latexit> <latexit sha1_base64="RoyT/UDNIvBWr3fMXjIksatLfHw=">ACiHicbVFNb9QwEHUChRK+FjhysdhFQkJaJUurAqeqvfRYENutFlWjneyseqPyB5XrKLlr/CfuPFvcNIc+jWS5ac3M8/jN0UthcM0/RfFDx7uPHq8+yR5+uz5i5eDV6/PnPGWw5QbaeysYA6k0DBFgRJmtQWmCgnxcVxmz+/BOuE0T9wU8NCsbUWpeAMA7Uc/MlL6V0locRklKMVTK8lWLGucPR7lCuGFWeymW1p7nzhAGnHFUXzftzNaLCUa50ZfwK1yqZhxpV5XfI1d+vSGI5pYL+WUMVjR0mveDpgsB8N0nHZB74KsB0PSx+ly8DdfGe4VaOSOTfP0hoXDbMouIRtknsHYcoLtoZ5gJopcIumM3JL3wdmRUtjw9FIO/Z6R8OUcxtVhMp2cHc715L35eYey8+LRujaI2h+9VDpJQ0WtFuhK2GBo9wEwLgVYVbK2aDm2F3rQnZ7S/fBWeTcfZpPk2GR4e9XbskrfkHflAMnJADskJOSVTwqOd6GO0F+3HSZzGB/GXq9I46nvekBsRH/0HltDEbg=</latexit> (Frank & Wolfe, 1956) Conditional Gradient Method (CGM) (Hazan, 2008) (Jaggi, 2013) min x ∈ X f ( x ) . X ⊂ R d is a convex compact set . f : X → R is a smooth function X x k Algorithm 1 CGM for smooth minimization x k +1 Input: x 1 2 X { x : f ( x ) Æ f ( x k ) } for k = 1 , 2 , . . . , do η k = 2 / ( k + 1) ≠Ò f ( x k ) ⌦ ↵ s k = arg min x ∈ X r f ( x k ) , x x k +1 = x k + η k ( s k � x k ) end for s k
<latexit sha1_base64="ROkSgvqJEucZzN9/35VXFP2E98=">AChnicbVFba9swFJbdXTLv0nR7ItY2IhM3a60jEohI6NPXawtIXIM7J8nIpasmvJw0H4p+xP9a3/ZnLqhy3dAaGP79y/k1YFVzoMbx1358HDR49HT7ynz56/2B3vTxTZVMzWLCyKOuLlCouISF5rqAi6oGKtICztOrz73/BfUipfyh15XEAu6kjznjGpLJePfX6etjz8dY0xSWHFpmC2mOo8Iqi/T1HzpEtJynE/bd/b38VtMrhuaYaKh1WYKbQVMb2r5HSHLMDgEXskrykzUWdkh4lqRGL4cdT9lDhPeN9vq0rOJdfw3gb6nYcJyGwYIxlPwiDcGL4PogFM0GCnyfiGZCVrBEjNCqrUMgorHRta84KsGs1CirKrugKlhZKkDFZiNjh9YJsN5WdsnNd6wf2cYKpRai9RG9uqobV9P/s+3bHT+MTZcVo0Gye4a5U2BdYn7m+CM1bFYm0BZTW3s2J2Sa2E2l7OsyJE2yvfB2ezIDoIZt8/TOYngxwjtI9eoymK0BGao2/oFC0Qc3Yc35k5B+7IDdxD9+gu1HWGnFfoH3PnfwC+08Es</latexit> <latexit sha1_base64="7nHqQi0aznUZ/ioMH2iN7gVSJw=">ACznicbVLjtMwFHXCY4byKrBkY9EgsaFKyoJZjmCDxKYgOlOpKZXtXLfW+BFsp5oSBbZ8Hzs+gP/ASMEM72S5aNzr4+P7zUtpXA+TX9F8Y2bt24fHd8Z3L13/8HD4aPHZ85UlsGMGWnsnBIHUmiYeElzEsLRFEJ5/TibZs/34J1wuhPflfCUpG1Flw4gO1Gv7OuazcRgL3gyT3VhC9lmDFeuOTb0muiN8wIut5g3NXUQcedxyl9cfmc5Fg4TDBzOgtXIZNlYR53FXlB+R4gokucMJXIiALuBCcgwXtBQmGu2RpnBNU7rA2+mUvfFAsvxTBk1D4r8lp0/uxQcgovCV2rztYDUfpO0CXwdZD0aoj+lq+DMvDKtUsMYkcW6RpaVf1sR6wSQ0g7xyEN56QdawCFATBW5Zd+No8PAFJgbG5b2uGP/PVET5dxO0VDZWndXcy15KLeoPD9Z1kKXlQfN9hfxSmJvcDvb0E4LzIfeFYIwK4JXzDbEhpmEH9A2Ibv65OvgbDLOXo0nHyaj0zd9O47RU/QMvUAZeo1O0Ts0RTPEovfRl+hrVMfTeBs38fd9aRz1Z56g/yL+8QezRt9G</latexit> <latexit sha1_base64="IfgMHnIhnbQyDS0SgB39jv7JclE=">ACHnicbVDLSgNBEJz17fqKevQyGAS9hF0f6DEoiEcFo4FsCLOTjg7OzC4zvZK4rD/ixV/x4kERwZP+jZOYg6+ChqKqm+6uOJXCYhB8eCOjY+MTk1PT/szs3PxCaXHpzCaZ4VDjiUxMPWYWpNBQ4ES6qkBpmIJ5/HVQd8/vwZjRaJPsZdCU7ELTqCM3RSq7QTZbrtfMC8SyOhaQYXnIm83pR5BFCF3MltFDiBorCv731D9e7G61SOagEA9C/JBySMhniuFV6i9oJzxRo5JZ2wiDFJs5Myi4hMKPMgsp41fsAhqOaqbANvPBewVdc0qbdhLjSiMdqN8ncqas7anYdfaPt7+9vif18iws9fMhU4zBM2/FnUySTGh/axoWxjgKHuOMG6Eu5XyS2YR5eY70If7/8l5xtVsKtyubJdrm6P4xjiqyQVbJOQrJLquSIHJMa4eSOPJAn8uzde4/ei/f61TriDWeWyQ9475/ZKOC</latexit> Stochastic Templates 8 E ξ f ( x, ξ ) (expectation) < F ( x ) := minimize F ( x ) P n 1 i =1 f i ( x ) (finite-sum) : n x ∈ X . X ⊂ R d is a convex compact set . f and f i are di ff erentiable and possibly non-convex . ⇠ ∼ P is a random variable
<latexit sha1_base64="paWYIr46CjipjFm3o/yAaO8JZus=">ACXHicbVFdSyMxFM3MqlvrV3cFX3y5WAQFLTN1YfdRVgQfHDBaqGp5U6a0WAmM0yS4exf3LfPGvaKYtrF8XAodz809OYkyKYwNgkfP/7KwuPS1tlxfWV1b32h8+35l0lwz3mGpTHU3QsOlULxjhZW8m2mOST5dXR/UvWv/3JtRKoubZHxfoK3SsSCoXUoGFogvYuisrTCQB9AKowkgjx3viAjsU+HMJ/qphR9OGmDVTyEdDzamTsREVFHgAdjXIcAo1TjVKCu6RweqFguoWhLuTm/ag0QxawbTgIwjnoEnmdTFo/KPDlOUJV5ZJNKYXBpntl6itYJP6jQ3PEN2j7e856DChJt+OQ1nAruOGYJz5I6yMGVfT5SYGFMkVNWJs37XkV+1uvlNv7VL4XKcsVmy2Kcwk2hSpGArNmZWFA8i0cF6B3aFGZt1/1F0I4fsnfwRX7VZ41Gr/+dE8/j2Po0a2yQ7ZIyH5SY7JGbkgHcLI3n2at6y9+Qv+Cv+2kzqe/OZTfKm/K0XEc6w5w=</latexit> <latexit sha1_base64="Y5iF1+MDf+rt+2jTCwGYK7YvEJQ=">ACVXicbVFdixMxFM3M1rXWj9b10ZeLRejiWmaqoA/7UBSXfeyC3RaWu6kmTY0k5kmaVl7J/si/hPfBFMPxa09ULg5Jx7uScnUSaFsUHw0/NPSg9OH5YfVR4/efqsWnt+dmvSXDPeZalMdT9Cw6VQvGuFlbyfaY5JHkvmn3e6L07ro1I1Ve7zPgwYkSsWBoHTWqSZqgnUZR8WUFQL8DVRhJhLixuKALcQ5v4Z6aizOXce3FlDJ50CNmCTobpfwhgoV2+UF0Pk8xzHQONUoJSzACbBdwFAW/dWoVg+awbgGIR7UCf76oxqazpOWZ5wZlEYwZhkNlhgdoKJvmqQnPDM2QznPCBgwoTbobFNpUVvHbMGJwXd5SFLfv3RIGJMcskcp0bi+ZQ25D/0wa5jT8OC6Gy3HLFdoviXIJNYRMxjIXmzMqlA8i0cF6BTVEjs+4jKi6E8PDJx+C21QzfNVs37+vtT/s4yuQleUaJCQfSJtckw7pEkbW5Jfneb73w/vtl/zTXav7WdekH/Kr/4BGsWwNA=</latexit> <latexit sha1_base64="7nHqQi0aznUZ/ioMH2iN7gVSJw=">ACznicbVLjtMwFHXCY4byKrBkY9EgsaFKyoJZjmCDxKYgOlOpKZXtXLfW+BFsp5oSBbZ8Hzs+gP/ASMEM72S5aNzr4+P7zUtpXA+TX9F8Y2bt24fHd8Z3L13/8HD4aPHZ85UlsGMGWnsnBIHUmiYeElzEsLRFEJ5/TibZs/34J1wuhPflfCUpG1Flw4gO1Gv7OuazcRgL3gyT3VhC9lmDFeuOTb0muiN8wIut5g3NXUQcedxyl9cfmc5Fg4TDBzOgtXIZNlYR53FXlB+R4gokucMJXIiALuBCcgwXtBQmGu2RpnBNU7rA2+mUvfFAsvxTBk1D4r8lp0/uxQcgovCV2rztYDUfpO0CXwdZD0aoj+lq+DMvDKtUsMYkcW6RpaVf1sR6wSQ0g7xyEN56QdawCFATBW5Zd+No8PAFJgbG5b2uGP/PVET5dxO0VDZWndXcy15KLeoPD9Z1kKXlQfN9hfxSmJvcDvb0E4LzIfeFYIwK4JXzDbEhpmEH9A2Ibv65OvgbDLOXo0nHyaj0zd9O47RU/QMvUAZeo1O0Ts0RTPEovfRl+hrVMfTeBs38fd9aRz1Z56g/yL+8QezRt9G</latexit> <latexit sha1_base64="vPGU97LXDY2t+Adnvdx4yMDsv0Y=">ACEXicbVDLSsNAFJ34rPUVdelmsAgpSEmqoBuhKIrLCvYBTSiT6aQdOpmEmYm0hP6CG3/FjQtF3Lpz5984bSNo64ELh3Pu5d57/JhRqWz7y1hYXFpeWc2t5dc3Nre2zZ3duowSgUkNRywSTR9JwignNUVI81YEBT6jDT8/uXYb9wTIWnE79QwJl6IupwGFCOlpbZpuSFSPd9Pr0bQ5chnCAbW4Mgd0CI8/1GurUGxbRbskj0BnCdORgogQ7VtfrqdCch4QozJGXLsWPlpUgoihkZ5d1EkhjhPuqSlqYchUR6eSjETzUSgcGkdDFZyovydSFEo5DH3dOb5fznpj8T+vlajgzEspjxNFOJ4uChIGVQTH8cAOFQrNtQEYUH1rRD3kEBY6RDzOgRn9uV5Ui+XnONS+fakULnI4siBfXALOCAU1ABN6AKagCDB/AEXsCr8Wg8G2/G+7R1wchm9sAfGB/fehubfg=</latexit> <latexit sha1_base64="IfgMHnIhnbQyDS0SgB39jv7JclE=">ACHnicbVDLSgNBEJz17fqKevQyGAS9hF0f6DEoiEcFo4FsCLOTjg7OzC4zvZK4rD/ixV/x4kERwZP+jZOYg6+ChqKqm+6uOJXCYhB8eCOjY+MTk1PT/szs3PxCaXHpzCaZ4VDjiUxMPWYWpNBQ4ES6qkBpmIJ5/HVQd8/vwZjRaJPsZdCU7ELTqCM3RSq7QTZbrtfMC8SyOhaQYXnIm83pR5BFCF3MltFDiBorCv731D9e7G61SOagEA9C/JBySMhniuFV6i9oJzxRo5JZ2wiDFJs5Myi4hMKPMgsp41fsAhqOaqbANvPBewVdc0qbdhLjSiMdqN8ncqas7anYdfaPt7+9vif18iws9fMhU4zBM2/FnUySTGh/axoWxjgKHuOMG6Eu5XyS2YR5eY70If7/8l5xtVsKtyubJdrm6P4xjiqyQVbJOQrJLquSIHJMa4eSOPJAn8uzde4/ei/f61TriDWeWyQ9475/ZKOC</latexit> <latexit sha1_base64="ROkSgvqJEucZzN9/35VXFP2E98=">AChnicbVFba9swFJbdXTLv0nR7ItY2IhM3a60jEohI6NPXawtIXIM7J8nIpasmvJw0H4p+xP9a3/ZnLqhy3dAaGP79y/k1YFVzoMbx1358HDR49HT7ynz56/2B3vTxTZVMzWLCyKOuLlCouISF5rqAi6oGKtICztOrz73/BfUipfyh15XEAu6kjznjGpLJePfX6etjz8dY0xSWHFpmC2mOo8Iqi/T1HzpEtJynE/bd/b38VtMrhuaYaKh1WYKbQVMb2r5HSHLMDgEXskrykzUWdkh4lqRGL4cdT9lDhPeN9vq0rOJdfw3gb6nYcJyGwYIxlPwiDcGL4PogFM0GCnyfiGZCVrBEjNCqrUMgorHRta84KsGs1CirKrugKlhZKkDFZiNjh9YJsN5WdsnNd6wf2cYKpRai9RG9uqobV9P/s+3bHT+MTZcVo0Gye4a5U2BdYn7m+CM1bFYm0BZTW3s2J2Sa2E2l7OsyJE2yvfB2ezIDoIZt8/TOYngxwjtI9eoymK0BGao2/oFC0Qc3Yc35k5B+7IDdxD9+gu1HWGnFfoH3PnfwC+08Es</latexit> Stochastic Templates 8 E ξ f ( x, ξ ) (expectation) < F ( x ) := minimize F ( x ) P n 1 i =1 f i ( x ) (finite-sum) : n x ∈ X . X ⊂ R d is a convex compact set . f and f i are di ff erentiable and possibly non-convex . ⇠ ∼ P is a random variable Assumptions unbiased estimates E r f ( x, ξ ) = r F ( x ) E kr f ( x, ξ ) � r F ( x ) k 2 σ 2 < + 1 , bounded variance 8 x 2 X E kr f ( x, ξ ) � r f ( y, ξ ) k 2 averaged smoothness L k x � y k 2 , 8 ( x, y ) 2 X 2
Recommend
More recommend