Optimizing an homogeneous polynomial on the unit sphere Rima Khouja Advisors: Bernard Mourrain Houssam Khalil JNCF 2020 1/20
We consider K = C or R . Let P be an homogeneous polynomial in K [ x 1 , . . . , x n ] d := K [ x ] d . The spectral norm of P , denoted by || P || σ, K , is by definition: || P || σ, K = max v ∈ S n − 1 | P ( v ) | , (1) where S n − 1 is the unit sphere in K n . For K = C , (1) is also equivalent to optimizing Re ( P ( v )) on the unit sphere. 2/20
Approximating P by a linear form to the d th power of the form ( v 1 x 1 + · · · + v n x n ) d := ( v t x ) d is solving the following minimization problem: ( w,v ) ∈ K × S n − 1 || P − w ( v t x ) d || d , dist 1 ( P ) := min (2) where || . || d is the apolar norm on K [ x ] d , such that for P = � � d � p α x α , | α | = d α �� � � d � || P || d = � P, P � d = p α ¯ p α . | α | = d α The two problems (1) and (2) are equivalent since: dist 1 ( P ) 2 = || P || 2 d − || P || 2 σ, K 3/20
The symmetric tensor decomposition P is a symmetric tensor of order d of dimension n ∈ S d ( C n ): d times � �� � n × ... × n P = [ a i 1 ...i d ] ∈ C such that a i 1 ...i d = a i σ (1) ...i σ ( d ) , ∀ σ ∈ G d The symmetric tensor decomposition of P consists in writing it as a sum of rank one symmetric tensors r � λ i ∈ C , a i ∈ C n P = λ i a i ⊗ ... ⊗ a i , (3) � �� � i =1 d times By correspondance of S d ( C n ) with C [ x ] d , (3) is equivalent to write a homogeneous polynomial P (associated to P ) as a sum of linear forms to the d th power: P = � r i =1 λ i ( a i, 1 x 1 + ... + a i,n x n ) d := � r i =1 λ i ( a t i x ) d 4/20
Symmetric rank and applications The symmetric rank of P , denoted by rank s ( P ), is the smallest ” r ” such that the decomposition (3) exists. Applications of this decomposition are numerous and of practical importance. 1 2 3 4 5 6 1 M. Janzamin, R. Ge, J. Kossaifi, and A. Anandkumar,Spectral learning on matrices and709tensors, Foundations and TrendsR c � in Machine Learning, 12 (2019), pp. 393–536. 2 P. Comon, “Tensor decompositions,” in Mathematics in Signal Processing V, J.G. McWhirter and I.K. Proudler, Eds., pp. 1–24. Clarendon Press, Oxford, UK, 2002. 3 P. Comon and M. Rajih, “Blind identification of under-determined mixtures based on the characteristic function,” Signal Process., 86 (2006), no. 9, pp. 2271–2281. 4 L. de Lathauwer, B. de Moor, and J. Vandewalle, “A multilinear singular value decomposition,” SIAM J. Matrix Anal. Appl., 21 (2000), no. 4, pp. 1253–1278. 5 N.D. Sidiropoulos, R. Bro, and G.B. Giannakis, “Parallel factor analysis in sensor array pro- cessing,” IEEE Trans. Signal Process., 48 (2000), no. 8, pp. 2377–2388. 6 A. Smilde, R. Bro, and P. Geladi, Multi-way analysis, John Wiley, West Sussex, UK, 2004. 5/20
The symmetric tensor approximation problem (STA) In practice the input tensor P is known with some errors on its coefficients. For this reason, computing an approximate decomposition of low symmetric rank is usually more interesting than computing the exact symmetric decomposition of P : ( STA ) 1 Q ∈ σ r || Q − P || 2 2 min (4) d , where σ r = { Q ∈ C [ x ] d | rank s ( Q ) ≤ r } We develop a Riemannian Newton iteration (RNS) to solve localy (4) when r is strictly lower than the generic symmetric rank given by the ”Alexander-Hirschowitz” theorem. 7 7 J. Alexander and A. Hirschowitz, “Polynomial interpolation in several variables,” J. Algebraic Geom., 4 (1995), no. 2, pp. 201–222. 6/20
The Riemannian Newton algorithm 2 || F ( p ) || 2 with F : M → E where Let f : M → R + , p �→ 1 ( M , � ., . � ) is a Riemannian manifold and E is an Euclidean space such that dim ( M ) ≤ dim ( E ). Newton’s method for a Riemannian least square problem consists in generating from an initial point x 0 , a sequence of iterates x 1 , x 2 , . . . which converges to a local minimum. We can summarize the method in two steps: 1 Solve the Newton equation Hess f ( x k )[ η k ] = − grad f ( x k ) for the unknown η k ∈ T x k M , where T x k M is the tangent space on M at the point x k ; 2 Apply a retraction operator R x k : T x k M → M to define the new point x k +1 on M by, x k +1 := R x k ( η k ). 7/20
Retraction operator Definition Let M be a manifold, p ∈ M . A retraction R p is a map T p M → M that satisfies the following properties: 1 R p (0 p ) = p ; 2 There exists an open neighborhood U ⊂ T p M of 0 p such that the restriction on U is well-defined and a smooth map; 3 R p satisfies the local rigidity condition DR p (0 p ) = id T p M where id T p M denotes the identity mapping on T p M . 8/20
Formulation of the STA problem as a Riemannian optimization problem Definition The Veronese manifold in C [ x ] d denoted by V d n is the set of the linear forms to the d th power in C [ x ] d − { 0 } . × r := V d Let Σ : V d n × · · · × V d n − → C [ x ] d , n i x ) d ) 1 ≤ i ≤ r ) = � r ( w i ( v t i x ) d ) 1 ≤ i ≤ r �− → Σ(( w i ( v t i =1 w i ( v t i x ) d Therefore, we formulate the Riemannian least square problem for the symmetric tensor approximation problem by using the Cartesian product of the Veronese manifold as follow: ( STA ∗ ) y ∈M f ( y ) min × r , y = ( w i ( v t where M = V d i x ) d ) 1 ≤ i ≤ r , f ( y ) = 1 2 || F ( y ) || 2 n d and F ( y ) = Σ( y ) − P. 9/20
Explicit formula of the Riemannian Newton iteration for the STA problem We call the vector ( w i ) 1 ≤ i ≤ r ∈ C × r ”the weight vector”, and we denote it by W . We represent the polynomials i x ) d by the matrix V = [ v i ] 1 ≤ i ≤ r ∈ C n × r . ( v t Parameterization: We consider f as a function of � � r , V ∈ C n × r , || v i || 2 = 1 ( W, Re ( V ) , Im ( V )) | W ∈ R ∗ N = + Let p ∈ N , and let B be an orthogonal projection on T p N . We denote by G p (resp. H p ) the gradient (resp. the Hessian) of f at p , then: Newton equation: H p [ η ] = − G p , such that: G p = B T G R p , H p = B T H R p B 10/20
Explicit formula of the Riemannian Newton iteration for the STA problem Re ( G 1 ) G R p is given by the vector 2 Re ( G 2 ) , such that: − 2 Im ( G 2 ) j v i ) d − P (¯ G 1 = ( � r i =1 w i ( v ∗ v j )) 1 ≤ j ≤ r 2 ( d � r v i − w j ∇ ¯ G 2 = 1 i =1 w i w j ( v ∗ i v j ) ( d − 1) ¯ P ( v j )) 1 ≤ j ≤ r 11/20
Explicit formula of the Riemannian Newton iteration for the STA problem H R p is given by the following block matrix: 2 Re ( B ) T − 2 Im ( B ) T Re ( A ) 2 Re ( B ) 2( Re ( C ) + Re ( D )) − 2( Im ( C ) + Im ( D )) − 2 Im ( B ) 2( Im ( D ) − Im ( C )) 2( Re ( D ) − Re ( C )) such that: A = [( v ∗ i v j ) d ] 1 ≤ i,j ≤ r v j + δ i,j ( d � r B = 1 j v i ) d − 1 ¯ l v i ) d − 1 ¯ 2 [ dw i ( v ∗ l =1 w l ( v ∗ v l − ∇ ¯ P ( v j ))] 1 ≤ i,j ≤ r 2 diag [ d ( d − 1)[ � r C = 1 i v j ) d − 2 ] 1 ≤ k,l ≤ n − i =1 w i w j v i,k v i,l ( v ∗ w j ∆ ¯ P ( v j )] 1 ≤ j ≤ r D = 1 2 [ dw i w j ( v ∗ i v j ) d − 2 (( v ∗ i v j ) I n + ( d − 1) v j v ∗ i )] 1 ≤ i,j ≤ r 12/20
Retraction on the Veronese manifold The Hankel matrix of degree k, d − k associated to a polynomial P in C [ x ] d is given by: H k,d − k = ( � P, x α + β � d ) | α | = k, | β | = d − k P Π : C [ x ] d → V d q )( q t x ) d ; n , P �→ Π( P ) = P (¯ where q is the first left singular vector from the SVD decomposition of H 1 ,d − 1 . P By using the map Π, we find that the map R P : T P V d n → V d n , Q �→ R P ( Q ) = Π( P + Q ), verifies the properties of a retraction operator. We have the following property: √ || P − Π( P ) || d ≤ d || P − P best || d , where P best ∈ V d n is any best approximation of P in the norm || . || d . 13/20
Examples Comparison with some existing state-of-the-art methods. Example 1 : Let P ∈ S 4 ( C 7 ) with entries: ( P ) i 1 ,i 2 ,i 3 ,i 4 = ( i ) i 1 + · · · + ( i ) i 4 . i 1 i 4 Notation: d rns (resp. d ccpd , d sd f ) denotes the distance between P and the approximation given by RNS 8 (resp. CCPD-NLS, SDF-NLS 9 ). N rns (resp. N ccpd , N sd f ) denotes the number of iterations of the RNS (resp. CCPD-NLS, SDF-NLS). t rns (resp. t ccpd , t sd f ) for the time in seconds consumed by the RNS (resp. CCPD-NLS, SDF-NLS). 8 ”https://gitlab.inria.fr/AlgebraicGeometricModeling/TensorDec.jl” 9N. Vervliet, O. Debals, L. Sorber, M. Van Barel, and L. De Lathauwer,Tensorlab 3.0,916Mar. 2016, https://www.tensorlab.net. 14/20
Examples Table 1: Computational results for Example 1 with random initial point uniformally distributed. r=3 r=5 r=10 d 0 211.7385 487.5316 301.8507 d rns 0.09780803 8.3285e-5 2.5257e-5 d ccpd 0.8689672 0.66874246 2.444e8 0.21156654 0.03242244 d sd 2.4439e8 f N rns 130 40 89 N ccpd 500 466 200 64 13 200 N sd f t rns 3.420527 1.882588 9.254812 t ccpd 1.766338 2.209960 1.319475 1.986222 0.777425 4.819483 t sd f In the later two examples we choose the initial point for the RNS by the structured low rank decomposition of multivariate Hankel matrices algorithm [HKM, 17.], denoted SHD. 15/20
Recommend
More recommend