best rank one approximation
play

Best rank-one approximation Definition: The first left singular - PowerPoint PPT Presentation

Best rank-one approximation Definition: The first left singular vector of A is defined to be the vector u 1 such that 1 u 1 = A v 1 , where 1 and v 1 are, respectively, the first singular value and the first right singular vector. Theorem:


  1. Best rank-one approximation Definition: The first left singular vector of A is defined to be the vector u 1 such that σ 1 u 1 = A v 1 , where σ 1 and v 1 are, respectively, the first singular value and the first right singular vector. Theorem: The best rank-one approximation to A is σ 1 u 1 v T 1 where σ 1 is the first singular value, u 1 is the first left singular vector, and v 1 is the first right singular vector of A .

  2. Best rank-one approximation: example  1 �  . 78 � 4 , the first right singular vector is v 1 ⇡ Example: For the matrix A = and the 5 2 . 63  . 54 � first singular value σ 1 is about 6.1. The first left singular vector is u 1 ⇡ , meaning . 84 σ 1 u 1 = A v 1 . Then  1  2 . 6 We then have � � 4 2 . 1 A � ˜ ⇡ � A 5 2 4 . 0 3 . 2 ˜ σ 1 u 1 v T = A 1  � 1 . 56 � 1 . 93  . 54 � ⇥ ⇡ ⇤ ⇡ 6 . 1 . 78 . 63 1 . 00 � 1 . 23 . 84  2 . 6 � 2 . 1 so the squared Frobenius norm of A � ˜ A is ⇡ 4 . 0 3 . 2 1 . 56 2 + 1 . 93 2 + 1 2 + 1 . 23 2 ⇡ 8 . 7 || A � ˜ A || 2 F = || A || 2 F � σ 2 1 ⇡ 8 . 7. X

  3. The closest one-dimensional a ffi ne space In trolley-line problem , line must go through origin: closest one-dimensional vector space . Perhaps line not through origin is much closer. An arbitrary line (one not necessarily passing through the origin) is a one-dimensional a ffi ne space. Given points a 1 , . . . , a m , I choose point ¯ a and translate each of the input points by subtracting ¯ a : a 1 � ¯ a , . . . , a m � ¯ a I find the one-dimensional vector space closest to these translated points, and then translate that vector space by adding back ¯ a . a = 1 Best choice of ¯ a is the centroid of the input points, the vector ¯ m ( a 1 + · · · + a m ). (Proof is lovely–maybe we’ll see it later.) Translating the points by subtracting o ff the centroid is called centering the points.

  4. The closest one-dimensional a ffi ne space In trolley-line problem , line must go through origin: closest one-dimensional vector space . Perhaps line not through origin is much closer. An arbitrary line (one not necessarily passing through the origin) is a one-dimensional a ffi ne space. Given points a 1 , . . . , a m , I choose point ¯ a and translate each of the input points by subtracting ¯ a : a 1 � ¯ a , . . . , a m � ¯ a I find the one-dimensional vector space closest to these translated points, and then translate that vector space by adding back ¯ a . a = 1 Best choice of ¯ a is the centroid of the input points, the vector ¯ m ( a 1 + · · · + a m ). (Proof is lovely–maybe we’ll see it later.) Translating the points by subtracting o ff the centroid is called centering the points.

  5. The closest one-dimensional a ffi ne space In trolley-line problem , line must go through origin: closest one-dimensional vector space . Perhaps line not through origin is much closer. An arbitrary line (one not necessarily passing through the origin) is a one-dimensional a ffi ne space. Given points a 1 , . . . , a m , I choose point ¯ a and translate each of the input points by subtracting ¯ a : a 1 � ¯ a , . . . , a m � ¯ a I find the one-dimensional vector space closest to these translated points, and then translate that vector space by adding back ¯ a . a = 1 Best choice of ¯ a is the centroid of the input points, the vector ¯ m ( a 1 + · · · + a m ). (Proof is lovely–maybe we’ll see it later.) Translating the points by subtracting o ff the centroid is called centering the points.

  6. Politics revisited We center the voting data, and find the closest one-dimensional vector space Span { v 1 } . Now projection along v gives better spread. Look at coordinate representation in terms of v : Which of the senators to the left of the origin are Republican? >>> {r for r in senators if is_neg[r] and is_Repub[r]} {’Collins’, ’Snowe’, ’Chafee’} Similarly, only three of the senators to the right of the origin are Democrat.

  7. Visualization revisited We now can turn a bunch of high-dimensional vectors into a bunch of numbers, plot the numbers on number line. Dimension reduction What about turning a bunch of high-dimensional vectors into vectors in R 2 or R 3 or R 10 ?

  8. Closest 1-dimensional vector space (trolley-line-location problem): I input: Vectors a 1 , . . . a m I output: Orthonormal basis { v 1 } for dim-1 vector space V 1 that minimizes P i (distance from a i to V 1 ) 2 i (distance from a i to Span { v 1 } ) 2 = || A || 2 F � k A v 1 k 2 We saw: P Therefore: Best vector v 1 is the unit vector that maximizes k A v 1 k . Closest k -dimensional vector space: I input: Vectors a 1 , . . . a m , integer k I output: Orthonormal basis { v 1 , . . . , v k } for dim- k vector space V k that minimizes i (distance from a i to V k ) 2 P Let v 1 , . . . , v k be an orthonormal basis for a subspace V By the Pythagorean Theorem, = a 1 � a k V a ? V 1 1 k a k V k a ? V k 2 k a 1 k 2 1 k 2 . = � . 1 . . . . = a m � a k V a ? V m k a k V m k a ? V k 2 k a m k 2 m k 2 = �

  9. Closest 1-dimensional vector space (trolley-line-location problem): I input: Vectors a 1 , . . . a m I output: Orthonormal basis { v 1 } for dim-1 vector space V 1 that minimizes P i (distance from a i to V 1 ) 2 i (distance from a i to Span { v 1 } ) 2 = || A || 2 F � k A v 1 k 2 We saw: P Therefore: Best vector v 1 is the unit vector that maximizes k A v 1 k . Closest k -dimensional vector space: I input: Vectors a 1 , . . . a m , integer k I output: Orthonormal basis { v 1 , . . . , v k } for dim- k vector space V k that minimizes i (distance from a i to V k ) 2 P Let v 1 , . . . , v k be an orthonormal basis for a subspace V By the Pythagorean Theorem, = a 1 � a k V a ? V 1 1 k a k V k a ? V k 2 k a 1 k 2 1 k 2 . = � . 1 . . . . = a m � a k V a ? V m k a k V m k a ? V k 2 k a m k 2 m k 2 = �

  10. For an orthonormal basis v 1 , . . . , v k of V , Thus (dist from a i to V ) 2 = k A k 2 k A v 1 k 2 + · · · + k A v k k 2 � X � F � i Therefore choosing a k -dimensional space V minimizing the sum of squared distances to V is equivalent to choosing k orthonormal vectors v 1 , . . . , v k to maximize k A v 1 k 2 + · · · + k A v k k 2 . How to choose such vectors? A greedy algorithm.

  11. Closest dimension- k vector space Computational Problem: closest low-dimensional subspace: I input: Vectors a 1 , . . . a m and positive integer k I output: basis for dim- k vector space V k that minimizes P i (distance from a i to V k ) 2 Algorithm for one dimension: choose unit-norm vector v that maximizes k A v k Natural generalization of this algorithm in which an orthonormal basis is sought. Algorithm: In i th iteration, select unit vector v that maximizes k A v k among those vectors orthogonal to all previously selected vectors def find right singular vectors ( A ): for i = 1 , 2 , . . . , min { m , n } • v 1 = norm-one vector v maximizing k A v k , v i =arg max {|| A v || : || v || = 1 , • v 2 = norm-one vector v orthog. to v 1 that v is orthog. to v 1 , v 2 , . . . v i � 1 } maximizes k A v k , until A v = 0 for every vector v orthogonal • v 3 = norm-one vector v orthog. to v 1 and v 2 to v 1 , . . . , v i Define σ i = k A v i k . that maximizes k A v k , and so on. return [ v 1 , v 2 , . . . , v r ] r = number of iterations.

  12. Closest dimension- k vector space Computational Problem: closest low-dimensional subspace: I input: Vectors a 1 , . . . a m and positive integer k i (distance from a i to V k ) 2 I output: basis for dim- k vector space V k that minimizes P Algorithm: In i th iteration, select vector v that maximizes k A v k among those vectors or- thogonal to all previously selected vectors. ) v 1 , . . . , v k Theorem: For each k � 0, the first k right singular vectors span the k -dimensional space V k i (distance from a i to V k ) 2 . that minimizes P Proof: by induction on k . The case k = 0 is trivial. Assume the theorem holds for k = q � 1. We prove it for k = q . Suppose W is a q -dimensional space. Let w q be a unit vector in W that is orthogonal to v 1 , . . . , v q � 1 . (Why is there such a vector?) Let w 1 , . . . , w q � 1 be vectors such that w 1 , . . . , w q form an orthonormal basis for W . (Why are there such vectors?) By choice of v q , k A v q k � k A w q k . By the induction hypothesis, Span { v 1 , . . . , v q � 1 } is the ( q � 1)-dimensional space minimizing sum of squared distances, so k A v 1 k 2 + · · · + k A v q � 1 k 2 � k A w 1 k 2 + · · · + k A w q � 1 k 2 .

Recommend


More recommend