Conjugate Directions • Powell’s method is based on a model quadratic objective function and conjugate directions in R n with respect to the Hessian of the quadratic objective function. R n , ∈ • what does it mean for two vectors u v to be conjugate ? R n , ∈ Definition: given that u v , then u and v are said to be mutually orthogonal u T v ( , ) ( , ) is our notation for the scalar product ). � if u v = = 0 (where u v R n , ∈ Definition: given that u v , then u and v are said to be mutually conjugate with respect to a symmetric positive definite matrix A if u and A v are mutually orthogonal, i.e. u T A v ( , ) = u A v = 0 . � • Note that if two vectors are mutually conjugate with respect to the identity matrix, that is A = , then they are mutually orthogonal. I Eigenvectors • x i is an eigenvector of the matrix A , with corresponding eigenvalue λ i if it satis- fies the equation λ i x i 1 … n , , A x i = i = and λ i is a solution to the characteristic equation A λ i I – = 0 . × R n n ∈ • If A is a symmetric positive definite matrix, then there will exist n eigenvectors, x 1 … x n , , ( , ) which are mutually orthogonal ( i.e. x i x j = 0 for ≠ ). i j ( , ) ( x i λ x j , ) λ x i x j ( , ) ≠ , this implies that the • Now since: x i A x j = = = for i 0 j eigenvectors, x i , are mutually conjugate with respect to the matrix A .
We Can Expand Any Vector In Terms Of A Set Of Conjugate Vectors Theorem: A set of n mutually conjugate vectors in R n span the R n space and therefore constitute a basis for R n . � Proof: 1 … n , , = let u i , i be mutually conjugate with respect to a symmetric positive × R n n ∈ definite matrix A . Consider a linear combination which is equal to zero: n ∑ α i u i = 0 i = 1 we pre-multiply by the matrix A n n ∑ ∑ α i u i α i A u i = = A 0 i = 1 i = 1 and take the inner product with u k n n ⎛ ⎞ ⎜ ⎟ ∑ ∑ , α i A u i α i u k A u i ( , ) α k u k A u k ( , ) u k = = = 0 ⎜ ⎟ ⎝ ⎠ i = 1 i = 1 Now, since A is positive definite, we have ( , ) > ∀ , ≠ u k A u k 0 , u k u k 0 Therefore, it must be that α k ∀ , which implies that u i , i 1 … n , , = 0 , k = are linearly independent and since there are n of them, they form a basis for the R n space. �
• What does it mean for a set of vectors to be linearly independent? Can you prove that a set of n linearly independent vectors in R n form a basis for the R n space? Expansion of an Arbitrary Vector R n ∈ Now consider an arbitrary vector x . We can expand x in our mutually conjugate basis as follows: n ∑ α i u i x = i = 1 where the scalar values α i are to be determined. We next take the inner product of u k with A x : n n ⎛ ⎞ ⎛ ⎞ ⎜ ⎟ ⎜ ⎟ ∑ ∑ ( , ) , α i u i , α i A u i u k A x = u k A = u k ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ = = i 1 i 1 n ∑ α i u k A u i ( , ) α k u k A u k ( , ) = = = i 1 from which we can solve for the scalar coefficients as ( , ) u k A x α k = - - - - - - - - - - - - - - - - - - - - - - - ( , ) u k A u k R n ∈ and we have that an arbitrary vector x can be expanded in terms of n 1 … n , , mutually conjugate vectors u i , i = as n ( , ) u k A x ∑ - - - - - - - - - - - - - - - - - - - - - - - u i = x ( , ) u k A u k i = 1
Definition: If a minimization method always locates the minimum of a general quadratic function in no more than a predetermined number of steps directly related to number of variables n , then the method is called quadratically convergent . � 1 - x T A x b T x ( ) - Theorem: If a quadratic function Q x = + + c is minimized 2 sequentially once along each direction of a set of n linearly independent, A - conjugate directions, then the global minimum of Q will be located at or before the n th step regardless of the starting point. � Proof: We know that ∇ Q x * A x * ( ) = + = (1) b 0 1 … n , , and given u i , i = to be A -conjugate vectors or, in this case, directions of minimization, we know from previous theorem that they are linearly independent. Let x 1 be the starting point of our search, then expanding the minimum x * as n x 1 ∑ x * α i u i = + (2) i = 1 n ⎛ ⎞ A x 1 ⎜ ⎟ ∑ A x * α i u i + = + + b b ⎜ ⎟ ⎝ ⎠ i = 1 n n A x 1 A x 1 ∑ ∑ α i u i α i A u i = + + = + + = b A b 0 i = 1 i = 1 taking the inner product with u j (using the notation v T u ( , ) = v u ) we have
n n T b T b A x 1 T A x 1 T A u i ∑ ∑ ( ) α i A u i ( ) α i u j u j + + u j = u j + + = 0 i = 1 i = 1 which, since the u i vectors are mutually conjugate with respect to the matrix A , we have T b A x 1 T A u j ( ) α j u j u j + + = 0 which can be re-written as T A x 1 T A u j ( ) α j u j b + u j + = 0 . Solving for the coefficients we have T A x 1 ( ) + b u j α j = – - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - . (3) T A u j u j Now in an iterative scheme where we determine successive approximations along the u i directions by minimization, we have x i + 1 x i * u i λ i 1 … N , , = + , i = (4) * are found by minimizing Q x i where the λ i ( λ i u i ) with respect to the variable + λ i , and N is possibly greater than n . Therefore, letting y i x i + 1 x i λ i u i = = + , we set the derivative of Q y i λ i Q x i ( ) ( λ i u i ) with respect to λ i equal to 0 using the chain rule of ( ) = + differentiation: n j ∂ y i ⎛ ⎞ ∂ Q d Q x i + 1 T ∇ Q x i + 1 ∑ ( ) ( ) = ⎜ ⎟ = = u i 0 ∂ λ i λ i d j ⎝ ⎠ ∂ y i * λ i j = 1
but ∇ Q x i + 1 A x i + 1 ( ) = b + and therefore T b A x i ( ( λ i u i ) ) u i + + = 0 * are given by from which we get that the λ i T x iT A x i b T u i ( ) b + u i + A u i * λ i = – - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - = – - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - . (5) T A u i T A u i u i u i From (4), we can write i + x i 1 x i * u i x 1 * u j ∑ λ i λ j = + = + j = 1 i – 1 x i x 1 * u j ∑ λ j = + . j = 1 Forming the product x iT A u i in (5) we get i – 1 x iT T T x 1 * u j T A u i x 1 ∑ ( ) λ j ( ) A u i = A u i + = A u i j = 1 T A u i * can be written as ≠ . Therefore, the λ i because u j = 0 for j i T A x 1 ( ) b + u i * λ i = – - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (6) T A u i u i * but comparing this (3) we see that λ i α i = and therefore n x 1 * u j ∑ x * λ j = + (7) j = 1
which says that starting at x 1 we take n steps of “length” λ j * , given by (6), in the u j directions and we get the minimum. * Therefore x * is reached in n steps or less if some λ j = 0 . �
Example: consider the quadratic function of two variables given as 2 2 f x ( ) = 1 + x 1 – x 2 + x 1 + x 2 . Use the previous theorem to find the minimum starting at the origin and minimizing successively along the two directions given by T T the unit vectors u 1 = and u 2 = . (First show that these vectors are 1 0 0 1 mutually conjugate with respect to the Hessian matrix of the function.) Solution: first write the function in matrix form as x 1 x 1 1 1 2 0 b T x - x T A x - - x 1 x 2 - f x ( ) = 1 + + = c + + – 1 1 2 2 x 2 0 4 x 2 where we can clearly see the Hessian matrix A . We can now check that the two directions given are mutually conjugate with respect to A as T A u 2 T A u 1 2 0 0 2 0 1 u 1 = = 0 , u 1 = = 2 , 1 0 1 0 0 4 1 0 4 0 T A u 2 2 0 0 = = u 2 4 . 0 1 0 4 1 T Starting from x 1 * and λ 2 * , from (6) as we find the two lengths, λ 1 = 0 0 1 T 1 – 1 A x 1 ( ) b + u 1 0 * 1 λ 1 = – - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - = – - - - - - - - - - - - - - - - - - - - - - - - - = – - - T A u 1 2 2 u 1 0 T 1 – 1 A x 1 ( ) b + u 2 1 * 1 λ 2 = – - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - = – - - - - - - - - - - - - - - - - - - - - - - - - = – - - T A u 2 4 4 u 2 and therefore, from (7), the minimum is found as
Recommend
More recommend