Low Rank Approximation Lecture 9 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1
Manifold optimization General setting: Aim at solving optimization problem X ∈M r f ( X ) , min where M r is a manifold of rank- r matrices or tensors. Goal: Modify classical optimization algorithms (line search, Newton, quasi-Newton, ...) to produce iterates that stay on M r . Advantages over ALS: ◮ No need to solve subproblems, at least for first-order methods; ◮ Can draw on concepts from classical smooth optimization (line search strategies, convergence analysis, ...). Two valuable resources: ◮ Absil/Mahony/Sepulchre’2011: Optimization Algorithms on Matrix Manifolds. PUP , 2008. Available from https://press.princeton.edu/absil ◮ Manopt, a Matlab toolbox for optimization on manifolds. Available from https://manopt.org/ 2
Manifolds For open sets U ⊂ M , V ⊂ R d chart is bijective function ϕ : U → V . Atlas of M into R d is collection of charts ( U α , ϕ α ) such that: ◮ � α U α = M ◮ for any α, β with U α ∩ U β � = {∅} , change of coordinates : R d → R d ϕ β ◦ ϕ − 1 α is smooth ( C ∞ ) on its domain ϕ α ( U α ∩ U β ) . Illustration taken from Wikipedia. 3
Manifolds In the following, we assume that atlas is maximal. Proper definition of smooth manifold M needs further properties (topology induced by maximal atlas is Hausdorff and second-countable). See [Lee’2003] and [Absil et al.’2008]. Properties of M : ◮ finite-dimensional vector spaces are always manifolds; ◮ d = dimension of M ; ◮ M does not need to be connected (in the context of smooth optimization makes sense to consider connected manifolds only); ◮ function f : M → R differentiable at point x ∈ M if and only if f ◦ ϕ − 1 : ϕ ( U ) ⊂ R d → R is differentiable at ϕ ( x ) for some chart ( U , ϕ ) with x ∈ U . 4
Manifolds: First examples Lemma Let M be a smooth manifold and N ⊂ M an open subset. Then N is a smooth manifold (of equal dimension). Proof: Given atlas for M obtain atlas for N by selecting charts ( U , ϕ ) with U ⊂ N . Example: GL ( n , R ) , the set of real invertible n × n matrices, is a smooth manifold. Show that R m × n EFY. , the set of real m × n matrices of full rank min { m , n } , is a smooth manifold. ∗ EFY. Show that the set of n × n symmetric positive definite matrices is a smooth manifold. Two main classes of matrix manifolds: ◮ embedded submanifolds of R m × n ; Example: Stiefel manifold of orthonormal bases. ◮ quotient manifolds; Example: Grassmann manifold R m × n / GL ( n , R ) . ∗ Will focus on embedded submanifolds (much easier to work with). 5
Immersions and submersion Let M 1 , M 2 be smooth manifolds and F : M 1 → M 2 . Let x ∈ M 1 and y = F ( x ) ∈ M 2 . Choose charts ϕ 1 , ϕ 2 around x , y . Then coordinate representation of F given by : R d 1 → R d 2 . ˆ F := ϕ 2 ◦ F ◦ ϕ − 1 1 ◮ F is called smooth if ˆ F is smooth (that is, C ∞ ). ◮ rank of F at x ∈ M 1 defined as the rank of D ˆ F ( ϕ ( x 1 )) (Jacobian of ˆ F at ϕ ( x 1 ) ) ◮ F is called an immersion if its rank equals d 1 at every x ∈ M 1 . ◮ F is called a submersion if its rank equals d 2 at every x ∈ M 1 . 6
Embedded submanifolds Subset N ⊂ M is called an embedded submanifold of dimension k in M if for each point p ∈ N there is a chart ( U , ϕ ) in M such that all elements of U ∩ N are obtained by varying first k coordinates only. (See Chapter 5 of [Lee’2003] for more details.) Theorem Let M , N be smooth manifolds and let F : M → N be a smooth map with constant rank ℓ . Then each level set F − 1 ( y ) := { x ∈ M : F ( x ) = y } is a closed embedded submanifold of codimension ℓ in M . Corollaries: ◮ If F : M → N is a submersion then each level is a closed embedded submanifold of codimension equal to the dimension of N . ◮ In fact, by open submanifold lemma, only need to check full rank condition of submersion for points in the level set (replace M by the open set for which F has full rank). 7
The Stiefel manifold For m ≥ n , consider the set of all m × n matrices with orthonormal columns: St ( m , n ) := { X ∈ R m × n : X T X = I n } . Corollary St ( m , n ) is an embedded submanifold of R m × n . Proof: Define F : R m × n → symm ( n ) as F : X �→ X T X , where symm ( n ) denotes set of n × n symmetric matrices. At X ∈ St ( m , n ) , consider Jacobian DF ( X ) : H �→ X T H + H T X . Given symmetric Y ∈ R n × n , set H = XY / 2. Then DF ( X )[ H ] = Y ; thus DF ( X ) is surjective. EFY. What is the dimension of the Stiefel manifold? 8
The manifold of rank- k matrices Locality of definition of embedded submanifolds implies the following lemma (Lemma 5.5 in [Lee’2003]). Lemma Let N be subset of smooth manifold M . Suppose every point p ∈ N has a neighborhood U ⊂ M such that U ∩ N is an embedded submanifold of U . Then N is an embedded submanifold of M . Theorem Given m ≥ n, the set M k = { A ∈ R m × n : rank ( A ) = k } is an embedded submanifold of R m × n for every 0 ≤ k ≤ n. 9
The manifold of rank- k matrices Choose arbitrary A 0 ∈ M k . After a suitable permutation, may assume w.l.o.g. that � A 11 � A 12 A 11 ∈ R k × k is invertible . A 0 = , A 21 A 22 This property remains true in an open neighborhood U ⊂ R m × n of A 0 . Factorize A ∈ U as � � � A 11 � � A − 1 � I 0 0 I 11 A 12 A = . A 21 A − 1 A 22 − A 21 A − 1 I 0 11 A 12 0 I 11 Define F : U → R ( m − k ) × ( n − k ) as F : A �→ A 22 − A 21 A − 1 11 A 12 . Then F − 1 ( 0 ) = U ∩ M k . 10
The manifold of rank- k matrices For arbitrary Y ∈ R ( m − k ) × ( n − k ) , we obtain that �� 0 �� 0 DF ( A ) = Y . 0 Y Thus, F is a submersion. In turn, U ∩ M k is an embedded submanifold of U . By lemma, M k is an embedded submanifold of R m × n . EFY. What is the dimension of M k ? EFY. Is M k connected? Prove that the set of symmetric rank- k matrices is an embedded submanifold of R n × n . Is this manifold connected? EFY. 11
Tangent space In the following, much of the discussion restricted to submanifolds M embedded in vector space V with inner product �· , ·� and induced norm � · � . Given smooth curve γ : R → M with x = γ ( 0 ) , we call γ ′ ( 0 ) ∈ V a tangent vector at x . The tangent space T x M ⊂ V is the set of all tangent vectors at x . Lemma T x M is a subspace of V. Proof. If v 1 , v 2 are tangent vectors then there are smooth curves γ 1 , γ 2 such that x = γ 1 ( 0 ) = γ 2 ( 0 ) and γ ′ 1 ( 0 ) = v 1 , γ ′ 2 ( 0 ) = v 2 . To show that α v 1 + β v 2 for α, β ∈ R is again a tangent vector, consider chart ( U , ϕ ) around x such that ϕ ( x ) = 0. Define γ ( t ) = ϕ − 1 ( αϕ ( γ 1 ( t )) + βϕ ( γ 2 ( t ))) for t sufficiently close to 0. Then γ ( 0 ) = x and γ ′ ( 0 ) = α v 1 + β v 2 . EFY. Prove that the dimension of Tx M equals the dimension of M using a coordinate chart. 12
Tangent space Application of definition to Stiefel manifold. Let γ ( t ) = X + tY + O ( t 2 ) be a smooth curve with X ∈ St ( m , n ) . To ensure that γ ( t ) ∈ St ( m , n ) , we require I n = γ ( t ) T γ ( t ) = ( X + tY ) T ( X + tY )+ O ( t 2 ) = I n + t ( X T Y + Y T X )+ O ( t 2 ) . Thus, X T Y + Y T X = 0 characterizes tangent space: { Y ∈ R m × n : X T Y = − Y T X } T x St ( m , n ) = { XW + X ⊥ W ⊥ : W ∈ R n × n , W = − W T , W ⊥ ∈ R ( m − n ) × n } = where the columns of X ⊥ form basis of span ( X ) ⊥ 13
Tangent space When M is defined (at least locally) as level set of constant rank function F : V → R N , we have T x M = ker ( DF ( x )) . Proof. Let v ∈ T x M , that is, there is a curve γ : R → M such that γ ( 0 ) = x and γ ′ ( 0 ) = v . Then, by chain rule, � DF ( x )[ v ] = DF ( x )[ γ ′ ( 0 )] = ∂ � ∂ t F ( γ ( t )) = 0 , � � t = 0 because F is constant on M . Thus, T x M ⊂ ker ( DF ( x )) , which completes the proof by counting dimensions. 14
Tangent space of M k Recall that M k was obtained as level set of local submersion F : A �→ A 22 − A 21 A − 1 11 A 12 . Given A ∈ M k consider SVD � � � � V Σ 0 � T . � U A = U ⊥ V ⊥ 0 0 We have � Σ � 0 DF [ H ] = H 22 . 0 0 Thus, H is in the kernel if and only if H 22 = 0. In terms of A this implies R k × k R k × ( n − k ) � � � � V � T T A M k � U = ker ( DF ( A )) = U ⊥ V ⊥ R ( m − k ) × k 0 { UMV T + U p V T + UV T p : M ∈ R k × k , U T p U = V T p V = 0 } . = EFY. Compute the tangent space for the embedded submanifold of rank- k symmetric matrices. 15
Riemannian manifold and gradient For submanifold M embedded in vector space V : Inner product �· , ·� on V induces inner product on T x M . This turns M into a Riemannian manifold. 1 The (Riemannian) gradient of smooth f : M → R at x ∈ M is defined as the unique element grad f ( x ) ∈ T x M that satisfies � grad f ( x ) , ξ � = Df ( x )[ ξ ] , ∀ ξ ∈ T x M . EFY. Prove that the Riemannian gradient satisfies the steepest ascent property grad f ( x ) = arg max Df ( x )[ ξ ] . � grad f ( x ) � 2 ξ ∈ Tx M � ξ � = 1 1 In general, for a Riemannian manifold one needs to have an inner product on T x M that varies smoothly wrt x . 16
Recommend
More recommend