notes the power method
play

Notes The Power Method Assignment 1 due tonight Start with some - PDF document

Notes The Power Method Assignment 1 due tonight Start with some random vector v, ||v|| 2 =1 (email me by tomorrow morning) Iterate v=(Av)/||Av|| The eigenvector with largest eigenvalue tends to dominate How fast? Linear


  1. Notes The Power Method � Assignment 1 due tonight � Start with some random vector v, ||v|| 2 =1 (email me by tomorrow morning) � Iterate v=(Av)/||Av|| � The eigenvector with largest eigenvalue tends to dominate � How fast? • Linear convergence, slowed down by close eigenvalues cs542g-term1-2007 1 cs542g-term1-2007 2 Shift and Invert (Rayleigh Iteration) Maximality and Orthogonality � Say the eigenvalue we want is approximately � k � Unit eigenvectors v 1 of the maximum � The matrix (A- � k I) -1 has the same eigenvectors magnitude eigenvalue satisfy as A 1 µ = Av 1 2 = max � But the eigenvalues are u = 1 Au 2 � � � k � Use this in the power method instead � Unit eigenvectors v k of the k � th eigenvalue � Even better, update guess at eigenvalue each satisfy Av k 2 = iteration: T Av k + 1 max Au 2 � k + 1 = v k + 1 u = 1 u T v i = 0, i < k � Gives cubic convergence! (triples the number of significant digits each iteration when converging) � Can pick them off one by one, or…. cs542g-term1-2007 3 cs542g-term1-2007 4 Orthogonal iteration Rayleigh-Ritz � Aside: find a subset of the eigenpairs � Solve for lots (or all) of eigenvectors • E.g. largest k, smallest k simultaneously � Orthogonal estimate V (n � k) of eigenvectors � Start with initial guess V � Simple Rayleigh estimate of eigenvalues: � For k=1, 2, … • diag(V T AV) • Z=AV � Rayleigh-Ritz approach: • Solve k � k eigenproblem V T AV • VR=Z (QR decomposition: orthogonalize Z) • Use those eigenvalues (Ritz values) and the � Easy, but slow associated orthogonal combinations of columns of V • Note: another instance of (linear convergence, nearby eigenvalues “assume solution lies in span of a few basis slow things down a lot) vectors, solve reduced dimension problem” cs542g-term1-2007 5 cs542g-term1-2007 6

  2. Solving the Full Problem Nonlinear optimization � Orthogonal iteration works, but it � s slow � Switch gears a little: we � ve already seen plenty of instances of � First speed-up: make A tridiagonal minimizing, with linear least-squares • Sequence of symmetric Householder � What about nonlinear problems? reflections • Then Z=AV runs in O(n 2 ) instead of O(n 3 ) ( ) x = argmin f x � Other ingredients: � Find • Shifting: if we shift A by an exact eigenvalue, x A- � I, we get an exact eigenvector out of QR � f(x) is called the objective (the last column) - improves on linear convergence � This is an unconstrained problem, since • Division: once an offdiagonal is almost zero, no limits on x. problem separates into decoupled blocks cs542g-term1-2007 7 cs542g-term1-2007 8 Classes of methods Steepest Descent � Only evaluate f: � The gradient is the direction of fastest • Stochastic search, pattern search, change cyclic coordinate descent (Gauss-Seidel), • Locally, f(x+dx) is smallest when dx is in the genetic algorithms, etc. direction of negative gradient � f � Also evaluate � f/ � x (gradient vector) � The algorithm: • Steepest descent and relatives x (0) • Start with guess • Quasi-Newton methods • Until converged: ( ) d ( k ) = �� f x ( k ) � Find direction � Also evaluate � 2 f/ � x 2 (Hessian matrix) � ( k ) � Choose step size • Newton � s method and relatives x ( k + 1) = x ( k ) + � ( k ) d ( k ) � Next guess is cs542g-term1-2007 9 cs542g-term1-2007 10 Convergence? Convexity � At global minimum, gradient is zero: � A function is convex if • Can test if gradient is smaller than some ( ) � � f x ( ) + (1 � � ) f y ( ) f � x + (1 � � ) y threshold for convergence [ ] � � 0,1 • Note: scaling problem: min A*f(B*x)+C � However, gradient is also zero at � Eliminates possibility of multiple strict local • Every local minimum mins • Every local maximum � Strictly convex: at most one local min • Every saddle-point � Very good property for a problem to have! cs542g-term1-2007 11 cs542g-term1-2007 12

  3. Selecting a step size � Scaling problem again: physical dimensions of x and gradient may not match � Choosing a step too large: • May end up further from minimum � Choosing a step too small: • Slow, maybe too slow to actually converge � Line search : keep picking different step sizes until satisfied cs542g-term1-2007 13

Recommend


More recommend