Stochastic Quasi-Gradient Methods: Variance Reduction via Jacobian Sketching Peter Richtárik Randomized Numerical Linear Algebra and Applications (Program: Foundations of Data Science) Simons Institute for the Theory of Computing, UC Berkeley September 24-27, 2018
Outline 1. Introduction 2. Jacobian Sketching 3. Controlled Stochastic Reformulations 4. JacSketch and SAGA 5. Iteration Complexity of JacSketch 6. Experiments
1. Introduction
Finite Sum Minimization Problem n x ∈ R d f ( x ) := 1 X min f i ( x ) n i =1 Data vector Label L2 regularizer i x � y i ) 2 + λ L2 regularized least squares f i ( x ) = 1 2 ( a > 2 k x k 2 (ridge regression) f i ( x ) = 1 + λ L2 regularized logistic regression ⇣ i x ⌘ 1 + e − y i a > 2 k x k 2 2 log
Stochastic Gradient Methods Current iterate Stepsize x k +1 = x k − α g k Next iterate Unbiased estimator of the gradient: g k ⇤ = r f ( x k ) ⇥ E
Variance Matters k g k � r f ( x k ) k 2 ⇤ g k ⇤ ⇥ ⇥ := E V g k ⇤ ⇥ E Gradient Descent (GD) g k r f ( x k ) g k ⇤ ⇥ V = 0 Stochastic Gradient Descent (SGD) g k r f i ( x k ) g k ⇤ ⇥ = BIG V
GD vs SGD Gradient Descent Stochastic Gradient Descent (GD) (SGD) x ∗ x ∗ x 0 x 0
Variance Reduction Decreasing Mini- Importance Adjusting the stepsizes batching sampling direction Sample more Duality (SDCA) How does it Scaling down More samples, important data or Control work? the noise less variance (or parameters) Variate (SVRG, more often S2GD, SAGA) A bit (SVRG, S2GD) or a lot Slow down; Might overfit More work per CONS: Hard to tune probabilities to (SDCA, SAGA) iteration the stepsize outliers more memory needed Improved Improved Still converges PROS: Parallelizable condition dependence on Widely known number epsilon All tricks can be combined!
2. Jacobian Sketching (JacSketch as a Stochastic Quasi-Gradient Method) Robert M Gower, Peter Richtárik and Francis Bach Stochastic Quasi-Gradient Methods: Variance Reduction via Jacobian Sketching arXiv:1805.02632, 2018
Lift and Sketch
Lift and Sketch LIFT 1 Jacobian of F f 1 ( x ) f 2 ( x ) F ( x ) = ∈ R n . r F ( x ) = [ r f 1 ( x ) , r f 2 ( x ) , . . . , r f n ( x )] 2 R d × n . . f n ( x ) i th unit basis vector SKETCH 2 Vector of all ones 1 n r F ( x ) e = r f ( x ) r F ( x ) e i = r f i ( x ) Leads to Stochastic Gradient Descent Leads to Gradient Descent
Introducing General Sketches We would like to solve the linear matrix equation: Too expensive to solve! J = r F ( x k ) d n Solve a random linear matrix equation instead: Random matrix S k ∼ D JS k = r F ( x k ) S k Has many solutions: which solution to pick? q Jacobian sketch
Sketch and Project
<latexit sha1_base64="UemL4bgYtijyr/1nKoT/RAr4keU=">AC2XicbVLbtQwFHXCq4TXFJZsLEZUBcQo6YKqagSEkKsymOmlcbT6NpxpiaJk9oO6siTBQsQYsufseMv+AScNKAy7ZUsHZ1zj67vsWmVC23C8JfnX7p85eq1tevBjZu3bt8ZrN+d6LJWjI9ZmZfqgILmuZB8bITJ+UGlOBQ05/s0e9nq+5+40qKUH8yi4rMC5lKkgoFxVDz4HbgilM+FtPxYglKweNwEltAUv2kObfYkavDzHbyxgQmoOSmEjG2nkm64VTxpXCcmQmJSgDmi1L5zxoQYUXCNZeO05arDGZ7iv0MysiQkaCcYfmKsrulHzgw2pXMe15DgC9wd9b6JM7zTyxJoDvhVs3lymD36JweEy+TsZvFgGI7CrvB5EPVgiPraiwc/SVKyuDSsBy0nkZhZWYWlBEs501Aas0rYBnM+dRBCW7rme0u2+CHjklwWip3pMEde9ZhodB6UVDX2WanV7WvEib1ibdnlkhq9pwyU4HpXuQsPtM+NEKJdhvnAmBLurpgdgQJm3GdoQ4hWVz4PJlujKBxFb7eGuy/6ONbQfQAbaIPUO76DXaQ2PEvIm39L54X/2p/9n/5n8/bfW93nMP/Vf+jz8AgOKY</latexit> <latexit sha1_base64="UemL4bgYtijyr/1nKoT/RAr4keU=">AC2XicbVLbtQwFHXCq4TXFJZsLEZUBcQo6YKqagSEkKsymOmlcbT6NpxpiaJk9oO6siTBQsQYsufseMv+AScNKAy7ZUsHZ1zj67vsWmVC23C8JfnX7p85eq1tevBjZu3bt8ZrN+d6LJWjI9ZmZfqgILmuZB8bITJ+UGlOBQ05/s0e9nq+5+40qKUH8yi4rMC5lKkgoFxVDz4HbgilM+FtPxYglKweNwEltAUv2kObfYkavDzHbyxgQmoOSmEjG2nkm64VTxpXCcmQmJSgDmi1L5zxoQYUXCNZeO05arDGZ7iv0MysiQkaCcYfmKsrulHzgw2pXMe15DgC9wd9b6JM7zTyxJoDvhVs3lymD36JweEy+TsZvFgGI7CrvB5EPVgiPraiwc/SVKyuDSsBy0nkZhZWYWlBEs501Aas0rYBnM+dRBCW7rme0u2+CHjklwWip3pMEde9ZhodB6UVDX2WanV7WvEib1ibdnlkhq9pwyU4HpXuQsPtM+NEKJdhvnAmBLurpgdgQJm3GdoQ4hWVz4PJlujKBxFb7eGuy/6ONbQfQAbaIPUO76DXaQ2PEvIm39L54X/2p/9n/5n8/bfW93nMP/Vf+jz8AgOKY</latexit> <latexit sha1_base64="UemL4bgYtijyr/1nKoT/RAr4keU=">AC2XicbVLbtQwFHXCq4TXFJZsLEZUBcQo6YKqagSEkKsymOmlcbT6NpxpiaJk9oO6siTBQsQYsufseMv+AScNKAy7ZUsHZ1zj67vsWmVC23C8JfnX7p85eq1tevBjZu3bt8ZrN+d6LJWjI9ZmZfqgILmuZB8bITJ+UGlOBQ05/s0e9nq+5+40qKUH8yi4rMC5lKkgoFxVDz4HbgilM+FtPxYglKweNwEltAUv2kObfYkavDzHbyxgQmoOSmEjG2nkm64VTxpXCcmQmJSgDmi1L5zxoQYUXCNZeO05arDGZ7iv0MysiQkaCcYfmKsrulHzgw2pXMe15DgC9wd9b6JM7zTyxJoDvhVs3lymD36JweEy+TsZvFgGI7CrvB5EPVgiPraiwc/SVKyuDSsBy0nkZhZWYWlBEs501Aas0rYBnM+dRBCW7rme0u2+CHjklwWip3pMEde9ZhodB6UVDX2WanV7WvEib1ibdnlkhq9pwyU4HpXuQsPtM+NEKJdhvnAmBLurpgdgQJm3GdoQ4hWVz4PJlujKBxFb7eGuy/6ONbQfQAbaIPUO76DXaQ2PEvIm39L54X/2p/9n/5n8/bfW93nMP/Vf+jz8AgOKY</latexit> <latexit sha1_base64="UemL4bgYtijyr/1nKoT/RAr4keU=">AC2XicbVLbtQwFHXCq4TXFJZsLEZUBcQo6YKqagSEkKsymOmlcbT6NpxpiaJk9oO6siTBQsQYsufseMv+AScNKAy7ZUsHZ1zj67vsWmVC23C8JfnX7p85eq1tevBjZu3bt8ZrN+d6LJWjI9ZmZfqgILmuZB8bITJ+UGlOBQ05/s0e9nq+5+40qKUH8yi4rMC5lKkgoFxVDz4HbgilM+FtPxYglKweNwEltAUv2kObfYkavDzHbyxgQmoOSmEjG2nkm64VTxpXCcmQmJSgDmi1L5zxoQYUXCNZeO05arDGZ7iv0MysiQkaCcYfmKsrulHzgw2pXMe15DgC9wd9b6JM7zTyxJoDvhVs3lymD36JweEy+TsZvFgGI7CrvB5EPVgiPraiwc/SVKyuDSsBy0nkZhZWYWlBEs501Aas0rYBnM+dRBCW7rme0u2+CHjklwWip3pMEde9ZhodB6UVDX2WanV7WvEib1ibdnlkhq9pwyU4HpXuQsPtM+NEKJdhvnAmBLurpgdgQJm3GdoQ4hWVz4PJlujKBxFb7eGuy/6ONbQfQAbaIPUO76DXaQ2PEvIm39L54X/2p/9n/5n8/bfW93nMP/Vf+jz8AgOKY</latexit> <latexit sha1_base64="UVca2UosYz+xlCjGmnLSjxHkKto=">ACbXicbVBNb9QwFHTSAiV8hSIOFIQsVoj2skp6oZdKFVw4LqLbVlqnkeO8ZK14sh+Qays3PiF3PgLXPgLJNtVbYdydJ4Zp6ePVmjpMUo+u35W9v37j/YeRg8evzk6bPw+e6Z1a0RMBVaXORcQtK1jBFiQouGgO8yhScZ4vPg3/+HYyVuj7FZQNJxctaFlJw7KU0/BkEbOZYVlA2kV3qVvRbly46yvQwCOgYwg90ORd547eh2hTEGB+9f3S4a6oTd8yows53hwyXJelmDoZpQlwYA0HEXjaAV6m8RrMiJrTNLwF8u1aCuoUShu7SyOGkwcNyiFgi5grYWGiwUvYdbTmldgE7dq6PveyWnhTb9qZGu1JsTjlfWLqusT1Yc53bTG8S7vFmLxVHiZN20CLW4WlS0iqKmQ/U0lwYEqmVPuDCyfysVc264wL7noYR48u3ydnhOI7G8dfD0cmndR075DV5R/ZJTD6SE/KFTMiUCPLHC71X3p7313/pv/HfXkV9bz3zgvwH/8M/bVq5HA=</latexit> <latexit sha1_base64="UVca2UosYz+xlCjGmnLSjxHkKto=">ACbXicbVBNb9QwFHTSAiV8hSIOFIQsVoj2skp6oZdKFVw4LqLbVlqnkeO8ZK14sh+Qays3PiF3PgLXPgLJNtVbYdydJ4Zp6ePVmjpMUo+u35W9v37j/YeRg8evzk6bPw+e6Z1a0RMBVaXORcQtK1jBFiQouGgO8yhScZ4vPg3/+HYyVuj7FZQNJxctaFlJw7KU0/BkEbOZYVlA2kV3qVvRbly46yvQwCOgYwg90ORd547eh2hTEGB+9f3S4a6oTd8yows53hwyXJelmDoZpQlwYA0HEXjaAV6m8RrMiJrTNLwF8u1aCuoUShu7SyOGkwcNyiFgi5grYWGiwUvYdbTmldgE7dq6PveyWnhTb9qZGu1JsTjlfWLqusT1Yc53bTG8S7vFmLxVHiZN20CLW4WlS0iqKmQ/U0lwYEqmVPuDCyfysVc264wL7noYR48u3ydnhOI7G8dfD0cmndR075DV5R/ZJTD6SE/KFTMiUCPLHC71X3p7313/pv/HfXkV9bz3zgvwH/8M/bVq5HA=</latexit> <latexit sha1_base64="UVca2UosYz+xlCjGmnLSjxHkKto=">ACbXicbVBNb9QwFHTSAiV8hSIOFIQsVoj2skp6oZdKFVw4LqLbVlqnkeO8ZK14sh+Qays3PiF3PgLXPgLJNtVbYdydJ4Zp6ePVmjpMUo+u35W9v37j/YeRg8evzk6bPw+e6Z1a0RMBVaXORcQtK1jBFiQouGgO8yhScZ4vPg3/+HYyVuj7FZQNJxctaFlJw7KU0/BkEbOZYVlA2kV3qVvRbly46yvQwCOgYwg90ORd547eh2hTEGB+9f3S4a6oTd8yows53hwyXJelmDoZpQlwYA0HEXjaAV6m8RrMiJrTNLwF8u1aCuoUShu7SyOGkwcNyiFgi5grYWGiwUvYdbTmldgE7dq6PveyWnhTb9qZGu1JsTjlfWLqusT1Yc53bTG8S7vFmLxVHiZN20CLW4WlS0iqKmQ/U0lwYEqmVPuDCyfysVc264wL7noYR48u3ydnhOI7G8dfD0cmndR075DV5R/ZJTD6SE/KFTMiUCPLHC71X3p7313/pv/HfXkV9bz3zgvwH/8M/bVq5HA=</latexit> <latexit sha1_base64="UVca2UosYz+xlCjGmnLSjxHkKto=">ACbXicbVBNb9QwFHTSAiV8hSIOFIQsVoj2skp6oZdKFVw4LqLbVlqnkeO8ZK14sh+Qays3PiF3PgLXPgLJNtVbYdydJ4Zp6ePVmjpMUo+u35W9v37j/YeRg8evzk6bPw+e6Z1a0RMBVaXORcQtK1jBFiQouGgO8yhScZ4vPg3/+HYyVuj7FZQNJxctaFlJw7KU0/BkEbOZYVlA2kV3qVvRbly46yvQwCOgYwg90ORd547eh2hTEGB+9f3S4a6oTd8yows53hwyXJelmDoZpQlwYA0HEXjaAV6m8RrMiJrTNLwF8u1aCuoUShu7SyOGkwcNyiFgi5grYWGiwUvYdbTmldgE7dq6PveyWnhTb9qZGu1JsTjlfWLqusT1Yc53bTG8S7vFmLxVHiZN20CLW4WlS0iqKmQ/U0lwYEqmVPuDCyfysVc264wL7noYR48u3ydnhOI7G8dfD0cmndR075DV5R/ZJTD6SE/KFTMiUCPLHC71X3p7313/pv/HfXkV9bz3zgvwH/8M/bVq5HA=</latexit> Sketch and Project New Jacobian Current Jacobian Frobenius norm estimate estimate J k +1 := J ∈ R d × n k J � J k k arg min JS k = r F ( x k ) S k subject to Solution: Random LME J k +1 = J k + ( r F ( x k ) � J k ) Π S k ensuring consistency with Jacobian sketch � † S > def S > � = S k k S k Π S k k
Sketch and Project I Original sketch and project • 2017 IMA Fox Prize (2 nd Prize) in Numerical Analysis Robert Mansel Gower and P.R. • Most downloaded SIMAX paper (2017) Randomized Iterative Methods for Linear Systems SIAM J. Matrix Analysis and Applications 36(4) : 1660-1690, 2015 Removal of full rank assumption + duality Robert Mansel Gower and P.R. Stochastic Dual Ascent for Solving Linear Systems arXiv:1512.06890 , 2015 Inverting matrices & connection to quasi-Newton updates Robert Mansel Gower and P.R. Randomized Quasi-Newton Methods are Linearly Convergent Matrix Inversion Algorithms SIAM J. on Matrix Analysis and Applications 38(4), 1380-1409, 2017 Computing the pseudoinverse Robert Mansel Gower and P.R. Linearly Convergent Randomized Iterative Methods for Computing the Pseudoinverse arXiv:1612.06255, 2016 Application to machine learning Robert Mansel Gower, Donald Goldfarb and P.R. Stochastic Block BFGS: Squeezing More Curvature out of Data ICML 2016 Sketch and project revisited: stochastic reformulations of linear systems P.R. and Martin Takáč Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory arXiv:1706.01108, 2017
Recommend
More recommend