Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview - PDF document
Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient Descent Quadratic Forms Gradient Descent in Quadratic Forms Eigen vectors and values Gradient Descent Convergence Conjugate
Conjugate Gradient (CG) Majid Lesani Alireza Masoum
Overview � Backpropagation � Gradient Descent � Quadratic Forms � Gradient Descent in Quadratic Forms � Eigen vectors and values � Gradient Descent Convergence � Conjugate Gradient
BackPropagation � Abstraction � Generalization problem • Heuristic features • Small networks • Early stopping • Regularization � Search � Convergence problem
Gradient Descent � Or Steepest Descent ∂ f ( x , y ) ∂ y ∂ f ( x , y ) ∂ x
Faster Training � Gradient Descent modification � Gradient Descent BP with Momentum � Variable Learning Rate BP � numerical optimization techniques � Conjugate Gradient BP � Quasi-Newton BP
Gradient Descent The problem is choosing the step size
Gradient Descent Choosing Best Step Size α f ( x ) � Choose Where is minimum i + i 1 ∂ f ( x ) 1 = + i 0 ∂ α i � (By chain rule) ∂ + α f ( x r ) ⇒ = ∇ = i i i f ( x ). r 0 + i 1 i ∂ α i ⇒ 1 = T r i r 0 + i
Gradient Descent Choosing Best Step Size
Quadratic forms � Our discussion is to minimize the quadratic function: 1 = 2 − + T T f ( x ) x Ax b x c
> v T Av 0 Positive definite (for every vector v, )
Quadratic Forms � A Symmetric Positive-Definite Matrix have a global minimum where gradient is zero 1 = 2 − + T T f ( x ) x Ax b x c = ∇ = − 0 f ( x ) Ax b � Solving equation Ax = b equals to minimize f
Gradient Descent for Quadratic Forms
� steepest descent for quadratic form is
Eigen Vectors and Eigen Values � An eigenvector of a matrix A is a nonzero vector that does not rotate when A is applied to it. Only scale by constant � Every symmetric matrix have n orthogonal eigen vector with it’s related eigen value
Using Eigen Vectors � think of a vector as a sum of other vectors whose behavior is understood
Using Eigen Vectors � Positive definite matrix is a matrix that all its eigen values are positive � Eigen vectors are axis of our rotated ellipse and each radius relate to corresponding eigen value
General Convergence of Steepest Descent � Relation between eigen values of A � Eigen vector components of error
Fast Convergence � Same eigen values have fast convergence
Poor Convergence � Different Eigen vectors and error component in direction of eigen vectors of smaller eigen values
Conjugate Gradient Overview � Orthogonal Directions � Conjugate vectors � Conjugate Directions � Gram-Schmidt algorithm � Gradient and error optimality � Conjugate Gradient
Orthogonal Directions � Steepest descent go in one direction many times � if we have n orthogonal search directions and choose best step every time After n steps we are at the goal!
Orthogonal Directions � We need every time error be orthogonal to previous direction
Conjugate vectors
Conjugate vectors � Two vectors and are A-orthogonal ( or conjugate) if � Being Conjugate in scaled space means orthogonal in unscaled space
Conjugate Directions � If we have n conjugate search directions and like orthogonal directions choose best step every time After n steps we are at the goal!
Conjugate Directions
Orthogonal Directions
Conjugate Directions � We need every time error be A-orthogonal to previous direction
Conjugate Directions = − e x x i i = − = − = − Ae Ax Ax Ax b r i i i i
Gram-Schmidt algorithm � So, only remains to find n conjugate directions � Gram-Schmidt algorithm do it have n independent Gives n conjugate directions
Gram-Schmidt algorithm
Gram-Schmidt algorithm
Conjugate Directions � So Algorithm is complete � but it’s ! � We had Gaussian elimination algorithm before
Conjugate Directions with axial unit vectors
Gradient and error optimality � For every � We have � It means
Conjugate Gradient � Use for � Makes equations very simple � Complexity from O(n^2) per iteration reduce to O(m), m is number of nonzero entries of A
Line Search � Finding stepsize compute best step-size α ∈ + α ⋅ arg min f ( x d ) i i i α ≥ 0
End � Thanks for your patience!
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.