optimization ms maths big data
play

Optimization MS Maths Big Data Alexandre Gramfort - PowerPoint PPT Presentation

Optimization MS Maths Big Data Alexandre Gramfort alexandre.gramfort@telecom-paristech.fr Telecom ParisTech M2 Maths Big Data Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Plan Notations 1 Ridge


  1. Optimization MS Maths Big Data Alexandre Gramfort alexandre.gramfort@telecom-paristech.fr Telecom ParisTech M2 Maths Big Data

  2. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Plan Notations 1 Ridge regression and quadratic forms 2 SVD 3 Woodbury 4 Dense Ridge 5 Sparse Ridge 6 Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 2

  3. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Optimization problem Definition (Optimization problem ( P )) min f ( x ) , x ∈ C , where f : R n → R ∪ { + ∞} is called the objective function C = { x ∈ R n / g ( x ) ≤ 0 et h ( x ) = 0 } is the feasible set g ( x ) ≤ 0 represent inequality constraints . g ( x ) = ( g 1 ( x ) , . . . , g p ( x )) so with p contraints. h ( x ) = 0 represent equality contraints . h ( x ) = ( h 1 ( x ) , . . . , h q ( x )) so with q contraints. an element x ∈ C is said to be feasible Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 3

  4. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Taylor at order 2 Assuming f is twice differentiable, the Taylor expansion at order 2 of f at x reads: ∀ h ∈ R n , f ( x + h ) = f ( x ) + ∇ f ( x ) T h + 1 2 h T ∇ 2 f ( x ) h + o ( � h � 2 ) ∇ f ( x ) ∈ R n is the gradient. ∇ 2 f ( x ) ∈ R n × n the Hessian matrix. Remark: Local quadratic approximation Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 4

  5. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Plan Notations 1 Ridge regression and quadratic forms 2 SVD 3 Woodbury 4 Dense Ridge 5 Sparse Ridge 6 Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 5

  6. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Ridge regression We consider problems with n samples, observations, and p features, variables. Definition (Ridge regression) Let y ∈ R n the n targets to predict and ( x i ) i the n samples in R p . Ridge regression consists in solving the following problem 2 � y − Xw − b � 2 + λ 1 2 � w � 2 , λ > 0 min w , b where w ∈ R p is called the weights vector, b ∈ R is the intercept (a.k.a. bias) and the ith row of X is x i . : Note that the intercept is not penalized with λ . Remark: Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 6

  7. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Taking care of the intercept There are different ways to deal with the intercept. Option 1: Center the target y and each column feature. After centering the problem reads: 1 2 � y − Xw � 2 + λ 2 � w � 2 , λ > 0 min w Option 2: Add a column of 1 to X and try not to penalize it (too much). Exercise Denote by y ∈ R the mean of y and by X ∈ R p the mean of T ˆ each column of X . Show that ˆ b = − X w + y . Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 7

  8. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Ridge regression Definition (Quadratic form) A quadratic form reads f ( x ) = 1 2 x T Ax + b T x + c where x ∈ R p , A ∈ R p × p , b ∈ R p and c ∈ R . Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 8

  9. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Ridge regression Questions Show that ridge regression boils down to the minimization of a quadratic form. Propose a closed form solution. Show that the solution is obtained by solving a linear system. Is the objective function strongly convex? Assuming n < p what is the value of the constant of strong convexity? Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 9

  10. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Plan Notations 1 Ridge regression and quadratic forms 2 SVD 3 Woodbury 4 Dense Ridge 5 Sparse Ridge 6 Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 10

  11. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Singular value decomposition (SVD) SVD is a factorization of a matrix (real here) M = U Σ V T where M ∈ R n × p , U ∈ R n × n , Σ ∈ R n × p , V ∈ R p × p U T U = UU T = I n (orthogonal matrix) V T V = VV T = I p (orthogonal matrix) Σ diagonal matrix Σ i , i are called the singular values U are left-singular vectors V are right-singular vectors Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 11

  12. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Singular value decomposition (SVD) SVD is a factorization of a matrix (real here) U contains the eigenvectors of MM T associated to the eigenvalues Σ 2 i , i V contains the eigenvectors of M T M associated to the eigenvalues Σ 2 i , i we assume here Σ i , i = 0 for min ( n , p ) ≤ i ≤ max ( n , p ) SVD is particularly useful to find the rank, null-space, image and pseudo-inverse of a matrix Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 12

  13. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Plan Notations 1 Ridge regression and quadratic forms 2 SVD 3 Woodbury 4 Dense Ridge 5 Sparse Ridge 6 Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 13

  14. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Matrix inversion lemma Proposition (Matrix inversion lemma) also known as Sherman–Morrison–Woodbury formula states that: � − 1 VA − 1 , ( A + UCV ) − 1 = A − 1 − A − 1 U C − 1 + VA − 1 U � where A ∈ R n × n , U ∈ R n × k , C ∈ R k × k , V ∈ R k × n . Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 14

  15. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Matrix inversion lemma (proof) Just check that (A+UCV) times the RHS of the Woodbury identity gives the identity matrix: � − 1 VA − 1 � � A − 1 − A − 1 U C − 1 + VA − 1 U � ( A + UCV ) = I + UCVA − 1 − ( U + UCVA − 1 U )( C − 1 + VA − 1 U ) − 1 VA − 1 = I + UCVA − 1 − UC ( C − 1 + VA − 1 U )( C − 1 + VA − 1 U ) − 1 VA − 1 = I + UCVA − 1 − UCVA − 1 = I Questions Using the matrix inversion lemma show that if n < p , the ridge regression problem can be solved by inverting a matrix of size n × n rather than p × p . Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 15

  16. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Plan Notations 1 Ridge regression and quadratic forms 2 SVD 3 Woodbury 4 Dense Ridge 5 Sparse Ridge 6 Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 16

  17. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Primal and dual implementation The solution of the ridge regression problem (without intercept) is obtained by solving the problem in the primal form: w = ( X T X + λ I p ) − 1 X T y ˆ or in the dual form: w = X T ( XX T + λ I n ) − 1 y ˆ In the dual formulation the matrix to invert in R n × n . What if X is sparse, n is 1e5 and p is 1e6? Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 17

  18. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Primal and dual implementation The solution of the ridge regression problem (without intercept) is obtained by solving the problem in the primal form: w = ( X T X + λ I p ) − 1 X T y ˆ or in the dual form: w = X T ( XX T + λ I n ) − 1 y ˆ In the dual formulation the matrix to invert in R n × n . What if X is sparse, n is 1e5 and p is 1e6? Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 17

  19. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Plan Notations 1 Ridge regression and quadratic forms 2 SVD 3 Woodbury 4 Dense Ridge 5 Sparse Ridge 6 Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 18

  20. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Conjugate gradient: Solve Ax = b , A ∈ R n × n and b ∈ R n 1: x 0 ∈ R n , g 0 = Ax 0 − b 2: for k = 0 to n do if g k = 0 then 3: break 4: end if 5: if k = 0 then 6: w k = g 0 7: else 8: � g k , Aw k − 1 � α k = − 9: � w k − 1 , Aw k − 1 � w k = g k + α k w k − 1 10: end if 11: � g k , w k � ρ k = 12: � w k , Aw k � x k + 1 = x k − ρ k w k 13: g k + 1 = Ax k + 1 − b 14: 15: end for 16: return x k + 1 Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 19

  21. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Sparse ridge with CG cf. Notebook Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 20

  22. Notations Ridge regression and quadratic forms SVD Woodbury Dense Ridge Sparse Ridge Logistic regression with CG cf. Notebook Alexandre Gramfort - Telecom ParisTech Optimization MS Maths Big Data 21

Recommend


More recommend