Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview - PDF document

Conjugate Gradient (CG) Majid Lesani Alireza Masoum

Overview � Backpropagation � Gradient Descent � Quadratic Forms � Gradient Descent in Quadratic Forms � Eigen vectors and values � Gradient Descent Convergence � Conjugate Gradient

BackPropagation � Abstraction � Generalization problem • Heuristic features • Small networks • Early stopping • Regularization � Search � Convergence problem

Gradient Descent � Or Steepest Descent ∂ f ( x , y ) ∂ y ∂ f ( x , y ) ∂ x

Faster Training � Gradient Descent modification � Gradient Descent BP with Momentum � Variable Learning Rate BP � numerical optimization techniques � Conjugate Gradient BP � Quasi-Newton BP

Gradient Descent The problem is choosing the step size

Gradient Descent Choosing Best Step Size α f ( x ) � Choose Where is minimum i + i 1 ∂ f ( x ) 1 = + i 0 ∂ α i � (By chain rule) ∂ + α f ( x r ) ⇒ = ∇ = i i i f ( x ). r 0 + i 1 i ∂ α i ⇒ 1 = T r i r 0 + i

Gradient Descent Choosing Best Step Size

Quadratic forms � Our discussion is to minimize the quadratic function: 1 = 2 − + T T f ( x ) x Ax b x c

> v T Av 0 Positive definite (for every vector v, )

Quadratic Forms � A Symmetric Positive-Definite Matrix have a global minimum where gradient is zero 1 = 2 − + T T f ( x ) x Ax b x c = ∇ = − 0 f ( x ) Ax b � Solving equation Ax = b equals to minimize f

Gradient Descent for Quadratic Forms

� steepest descent for quadratic form is

Eigen Vectors and Eigen Values � An eigenvector of a matrix A is a nonzero vector that does not rotate when A is applied to it. Only scale by constant � Every symmetric matrix have n orthogonal eigen vector with it’s related eigen value

Using Eigen Vectors � think of a vector as a sum of other vectors whose behavior is understood

Using Eigen Vectors � Positive definite matrix is a matrix that all its eigen values are positive � Eigen vectors are axis of our rotated ellipse and each radius relate to corresponding eigen value

General Convergence of Steepest Descent � Relation between eigen values of A � Eigen vector components of error

Fast Convergence � Same eigen values have fast convergence

Poor Convergence � Different Eigen vectors and error component in direction of eigen vectors of smaller eigen values

Conjugate Gradient Overview � Orthogonal Directions � Conjugate vectors � Conjugate Directions � Gram-Schmidt algorithm � Gradient and error optimality � Conjugate Gradient

Orthogonal Directions � Steepest descent go in one direction many times � if we have n orthogonal search directions and choose best step every time After n steps we are at the goal!

Orthogonal Directions � We need every time error be orthogonal to previous direction

Conjugate vectors

Conjugate vectors � Two vectors and are A-orthogonal ( or conjugate) if � Being Conjugate in scaled space means orthogonal in unscaled space

Conjugate Directions � If we have n conjugate search directions and like orthogonal directions choose best step every time After n steps we are at the goal!

Conjugate Directions

Orthogonal Directions

Conjugate Directions � We need every time error be A-orthogonal to previous direction

Conjugate Directions = − e x x i i = − = − = − Ae Ax Ax Ax b r i i i i

Gram-Schmidt algorithm � So, only remains to find n conjugate directions � Gram-Schmidt algorithm do it have n independent Gives n conjugate directions

Gram-Schmidt algorithm

Conjugate Directions � So Algorithm is complete � but it’s ! � We had Gaussian elimination algorithm before

Conjugate Directions with axial unit vectors

Gradient and error optimality � For every � We have � It means

Conjugate Gradient � Use for � Makes equations very simple � Complexity from O(n^2) per iteration reduce to O(m), m is number of nonzero entries of A

Line Search � Finding stepsize compute best step-size α ∈ + α ⋅ arg min f ( x d ) i i i α ≥ 0

End � Thanks for your patience!

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview - PDF document

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient Descent Quadratic Forms Gradient Descent in Quadratic Forms Eigen vectors and values Gradient Descent Convergence Conjugate

Tracking Perform ance of the MMax Conjugate Gradient Algorithm Bei Xie and Tam al Bose

Conjugate gradient methods for stochastic Galerkin finite element saddle point matrices B T A

Choosing Priors Probability Intervals 18.05 Spring 2014 Conjugate priors A prior is conjugate

Conjugate gradient training algorithm Steepest descent algorithm Definitions: So far: j

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

SPEEDING UP CONJUGATE GRADIENT SOLVERS BY 10X Mathias Wagner, Developer Technology Engineer GTC

Minimization strategy for choice of the stopping index in conjugate gradient type methods for

A Projected Preconditioned Conjugate Gradient algorithm for computing a large invariant subspace

Todays Discussion Conjugate gradient algorithm To date: 1. Choose an initial weight vector w

Algorithm-based checkpoint-recovery for the conjugate gradient method Carlos Pachajoa, Christina

Iterative Methods Mostly for SPD systems Iterative Linear conjugate gradient and its variants

Solving Domain Wall Dirac Equation Using Multisplitting Preconditioned Conjugate Gradient Jiqun

Transform and Conjugate Gradient using MPI Datatypes Torsten Hoefler , Steven Gottlieb EuroMPI

How to use Gradient and Multi-Texture 1. Many situations, we need use the gradient texture for our

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

Orthogonal geometry over the field with two elements J.I. Hall Michigan State University East

Quadratic forms, lattice points and interference alignment Faustin ADICEAM (joint with Evgeniy

Adversarial Perturbations of Opinion Dynamics in Networks Jason Gaitonde (Cornell University)

Composition Laws Melanie Matchett Wood American Institute of Mathematics and Stanford University

STAT 113 Data, Variables and Sampling Colin Reimer Dawson Oberlin College 29 August 2017 1 / 9

On Testing Conditional Qualitative Treatment Effects Chengchun Shi Department of Statistics

From Qualitative to Quantitative Program Analysis: Permissive Enforcement of Secure Information

Graph-based Modeling of Biological Regulatory Networks : Introduction of Singular States Adrien