Optimizing an homogeneous polynomial on the unit sphere Rima Khouja - - PowerPoint PPT Presentation

optimizing an homogeneous polynomial on the unit sphere
SMART_READER_LITE
LIVE PREVIEW

Optimizing an homogeneous polynomial on the unit sphere Rima Khouja - - PowerPoint PPT Presentation

Optimizing an homogeneous polynomial on the unit sphere Rima Khouja Advisors: Bernard Mourrain Houssam Khalil JNCF 2020 1/20 We consider K = C or R . Let P be an homogeneous polynomial in K [ x 1 , . . . , x n ] d := K [ x ] d . The spectral


slide-1
SLIDE 1

1/20

Optimizing an homogeneous polynomial on the unit sphere

Rima Khouja

Advisors: Bernard Mourrain Houssam Khalil

JNCF 2020

slide-2
SLIDE 2

2/20

We consider K = C or R. Let P be an homogeneous polynomial in K[x1, . . . , xn]d := K[x]d. The spectral norm of P, denoted by ||P||σ,K, is by definition: ||P||σ,K = max

v∈Sn−1 |P(v)|,

(1) where Sn−1 is the unit sphere in Kn . For K = C, (1) is also equivalent to optimizing Re(P(v)) on the unit sphere.

slide-3
SLIDE 3

3/20

Approximating P by a linear form to the dth power of the form (v1x1 + · · · + vnxn)d := (vtx)d is solving the following minimization problem: dist1(P) := min

(w,v)∈K×Sn−1 ||P − w(vtx)d||d,

(2) where ||.||d is the apolar norm on K[x]d, such that for P =

|α|=d

d

α

  • pαxα,

||P||d =

  • P, Pd =
  • |α|=d

d

α

  • pα¯

pα. The two problems (1) and (2) are equivalent since: dist1(P)2 = ||P||2

d − ||P||2 σ,K

slide-4
SLIDE 4

4/20

The symmetric tensor decomposition

P is a symmetric tensor of order d of dimension n ∈ Sd(Cn): P = [ai1...id] ∈ C

d times

  • n × ... × n

such that ai1...id = aiσ(1)...iσ(d) , ∀σ ∈ Gd The symmetric tensor decomposition of P consists in writing it as a sum of rank one symmetric tensors P =

r

  • i=1

λi ai ⊗ ... ⊗ ai

  • d times

, λi ∈ C, ai ∈ Cn (3) By correspondance of Sd(Cn) with C[x]d, (3) is equivalent to write a homogeneous polynomial P (associated to P) as a sum of linear forms to the dth power: P = r

i=1 λi(ai,1x1 + ... + ai,nxn)d := r i=1 λi(at ix)d

slide-5
SLIDE 5

5/20

Symmetric rank and applications

The symmetric rank of P, denoted by ranks(P), is the smallest ”r” such that the decomposition (3) exists. Applications of this decomposition are numerous and of practical importance.1 2 3 4 5 6

  • 1M. Janzamin, R. Ge, J. Kossaifi, and A. Anandkumar,Spectral learning on matrices

and709tensors, Foundations and TrendsR c in Machine Learning, 12 (2019), pp. 393–536.

  • 2P. Comon, “Tensor decompositions,” in Mathematics in Signal Processing V, J.G. McWhirter

and I.K. Proudler, Eds., pp. 1–24. Clarendon Press, Oxford, UK, 2002.

  • 3P. Comon and M. Rajih, “Blind identification of under-determined mixtures based on the

characteristic function,” Signal Process., 86 (2006), no. 9, pp. 2271–2281.

  • 4L. de Lathauwer, B. de Moor, and J. Vandewalle, “A multilinear singular value decomposition,”

SIAM J. Matrix Anal. Appl., 21 (2000), no. 4, pp. 1253–1278.

5N.D. Sidiropoulos, R. Bro, and G.B. Giannakis, “Parallel factor analysis in sensor array pro-

cessing,” IEEE Trans. Signal Process., 48 (2000), no. 8, pp. 2377–2388.

  • 6A. Smilde, R. Bro, and P. Geladi, Multi-way analysis, John Wiley, West Sussex, UK, 2004.
slide-6
SLIDE 6

6/20

The symmetric tensor approximation problem (STA)

In practice the input tensor P is known with some errors

  • n its coefficients. For this reason, computing an

approximate decomposition of low symmetric rank is usually more interesting than computing the exact symmetric decomposition of P: (STA) 1 2 min

Q∈σr ||Q − P||2 d,

(4) where σr = {Q ∈ C[x]d | ranks(Q) ≤ r} We develop a Riemannian Newton iteration (RNS) to solve localy (4) when r is strictly lower than the generic symmetric rank given by the ”Alexander-Hirschowitz” theorem.7

  • 7J. Alexander and A. Hirschowitz, “Polynomial interpolation in several variables,” J. Algebraic

Geom., 4 (1995), no. 2, pp. 201–222.

slide-7
SLIDE 7

7/20

The Riemannian Newton algorithm

Let f : M → R+, p → 1

2||F(p)||2 with F : M → E where

(M, ., .) is a Riemannian manifold and E is an Euclidean space such that dim(M) ≤ dim(E). Newton’s method for a Riemannian least square problem consists in generating from an initial point x0, a sequence

  • f iterates x1, x2, . . . which converges to a local minimum.

We can summarize the method in two steps:

1 Solve the Newton equation

Hess f(xk)[ηk] = −grad f(xk) for the unknown ηk ∈ TxkM, where TxkM is the tangent space on M at the point xk;

2 Apply a retraction operator Rxk : TxkM → M to define the

new point xk+1 on M by, xk+1 := Rxk(ηk).

slide-8
SLIDE 8

8/20

Retraction operator

Definition Let M be a manifold, p ∈ M. A retraction Rp is a map TpM → M that satisfies the following properties:

1 Rp(0p) = p; 2 There exists an open neighborhood U ⊂ TpM of 0p such

that the restriction on U is well-defined and a smooth map;

3 Rp satisfies the local rigidity condition

DRp(0p) = idTpM where idTpM denotes the identity mapping on TpM.

slide-9
SLIDE 9

9/20

Formulation of the STA problem as a Riemannian

  • ptimization problem

Definition The Veronese manifold in C[x]d denoted by Vd

n is the set of the

linear forms to the dth power in C[x]d − {0}. Let Σ : Vd

n ×r := Vd n × · · · × Vd n −

→ C[x]d, (wi(vt

ix)d)1≤i≤r −

→ Σ((wi(vt

ix)d)1≤i≤r) = r i=1 wi(vt ix)d

Therefore, we formulate the Riemannian least square problem for the symmetric tensor approximation problem by using the Cartesian product of the Veronese manifold as follow: (STA∗) min

y∈M f(y)

where M = Vd

n ×r, y = (wi(vt ix)d)1≤i≤r, f(y) = 1 2||F(y)||2 d

and F(y) = Σ(y) − P.

slide-10
SLIDE 10

10/20

Explicit formula of the Riemannian Newton iteration for the STA problem

We call the vector (wi)1≤i≤r ∈ C×r ”the weight vector”, and we denote it by W. We represent the polynomials (vt

ix)d by the matrix V = [vi]1≤i≤r ∈ Cn×r.

Parameterization: We consider f as a function of N=

  • (W, Re(V ), Im(V )) | W ∈ R∗

+ r, V ∈ Cn×r, ||vi||2 = 1

  • Let p ∈ N, and let B be an orthogonal projection on TpN.

We denote by Gp (resp. Hp) the gradient (resp. the Hessian) of f at p, then: Newton equation: Hp[η] = −Gp, such that: Gp = BT GR

p ,

Hp = BT HR

p B

slide-11
SLIDE 11

11/20

Explicit formula of the Riemannian Newton iteration for the STA problem

GR

p is given by the vector

  Re(G1) 2Re(G2) −2Im(G2)  , such that:

G1 = (r

i=1 wi(v∗ j vi)d − P(¯

vj))1≤j≤r G2 = 1

2(d r i=1 wiwj(v∗ i vj)(d−1)¯

vi − wj∇ ¯ P(vj))1≤j≤r

slide-12
SLIDE 12

12/20

Explicit formula of the Riemannian Newton iteration for the STA problem

HR

p is given by the following block matrix:

  Re(A) 2Re(B)T −2Im(B)T 2Re(B) 2(Re(C) + Re(D)) −2(Im(C) + Im(D)) −2Im(B) 2(Im(D) − Im(C)) 2(Re(D) − Re(C))   such that:

A = [(v∗

i vj)d]1≤i,j≤r

B = 1

2[dwi(v∗ j vi)d−1¯

vj + δi,j(d r

l=1 wl(v∗ l vi)d−1¯

vl − ∇ ¯ P(vj))]1≤i,j≤r C = 1

2diag[d(d − 1)[r i=1 wiwjvi,kvi,l(v∗ i vj)d−2]1≤k,l≤n −

wj∆ ¯ P(vj)]1≤j≤r D = 1

2[dwiwj(v∗ i vj)d−2((v∗ i vj)In + (d − 1)vjv∗ i )]1≤i,j≤r

slide-13
SLIDE 13

13/20

Retraction on the Veronese manifold

The Hankel matrix of degree k, d − k associated to a polynomial P in C[x]d is given by: Hk,d−k

P

= (P, xα+βd)|α|=k,|β|=d−k Π : C[x]d → Vd

n, P → Π(P) = P(¯

q)(qtx)d; where q is the first left singular vector from the SVD decomposition of H1,d−1

P

. By using the map Π, we find that the map RP : TP Vd

n → Vd n, Q → RP (Q) = Π(P + Q), verifies the

properties of a retraction operator. We have the following property: ||P − Π(P)||d ≤ √ d ||P − Pbest||d, where Pbest ∈ Vd

n is any best approximation of P in the

norm ||.||d.

slide-14
SLIDE 14

14/20

Examples

Comparison with some existing state-of-the-art methods. Example 1: Let P ∈ S4(C7) with entries: (P)i1,i2,i3,i4 = (i)i1 i1 + · · · + (i)i4 i4 . Notation:

drns (resp. dccpd, dsd

f) denotes the distance between P and the

approximation given by RNS 8 (resp. CCPD-NLS, SDF-NLS 9). Nrns (resp. Nccpd, Nsd

f) denotes the number of iterations of the

RNS (resp. CCPD-NLS, SDF-NLS). trns (resp. tccpd, tsd

f) for the time in seconds consumed by the

RNS (resp. CCPD-NLS, SDF-NLS).

8 ”https://gitlab.inria.fr/AlgebraicGeometricModeling/TensorDec.jl”

  • 9N. Vervliet, O. Debals, L. Sorber, M. Van Barel, and L. De Lathauwer,Tensorlab

3.0,916Mar. 2016, https://www.tensorlab.net.

slide-15
SLIDE 15

15/20

Examples

Table 1: Computational results for Example 1 with random initial point uniformally distributed.

r=3 r=5 r=10 d0 211.7385 487.5316 301.8507 drns 0.09780803 8.3285e-5 2.5257e-5 dccpd 0.8689672 0.66874246 2.444e8 dsd

f

0.21156654 0.03242244 2.4439e8 Nrns 130 40 89 Nccpd 500 466 200 Nsd

f

64 13 200 trns 3.420527 1.882588 9.254812 tccpd 1.766338 2.209960 1.319475 tsd

f

1.986222 0.777425 4.819483 In the later two examples we choose the initial point for the RNS by the structured low rank decomposition of multivariate Hankel matrices algorithm [HKM, 17.], denoted SHD.

slide-16
SLIDE 16

16/20

Examples

Example 2: P ∈ S3(Rn) with entries: (P)i1,i2,i3 = (−1)i1

i1

+ (−1)i2

i2

+ (−1)i3

i3

.

Table 2: Computational results for Example 2 with r = 2, the initial point of RNS is chosen by SHD algorithm.

n = 5 n = 10 n = 15 drns 4.5169e-7 4.59e-7 1.0048e-6 dccpd 0.01348249 67.5542 0.20721736 dsd

f

0.0230521 36.2859 0.20246552 Nrns 1 1 1 Nccpd 200 200 500 Nsd

f

200 200 500 trns 0.009038 0.011488 0.062949 tccpd 2.654101 2.877974 5.126160 tsd

f

2.667050 3.452365 6.435196

slide-17
SLIDE 17

17/20

Examples

Real rank-1 approximation of a real symmetric tensor. Example 3: Consider the tensor P ∈ S5(Rn) given as: (P)i1,...,i5 = (−1)i1ln(i1) + ... + (−1)i5ln(i5). Notation:

wrns(vt

rnsx)d (resp. wsdp(vt sdpx)d) denotes the rank-1

approximation of P obtained by RNS (resp. SDP 10 method). N denotes the number of iterations of the RNS method. trns (resp. tsdp) denotes the time spent by the RNS method in the format seconds(s) (resp. by the SDP method in the format hr:mn:sc). d0 (resp. d∗) denotes the norm between P and the initial point (resp. and the local minimizer) of the RNS method.

  • 10J. Nie and L. Wang,Semidefinite relaxations for best rank-1 tensor approximations,

SIAM731Journal on Matrix Analysis and Applications, 35 (2014), pp. 1155–1179.732[36]J.

slide-18
SLIDE 18

18/20

Examples

Recall that if v is a global maximizer of maxv∈Sn−1 |P(v)| such that w = P(v), then w v⊗d is a best rank-1 approximation of P. Herein, a rank-1 approximation w v⊗d, such that w = P(v) and ||v|| = 1, is better when |w| is higher.

Table 3: Computational results for Example 3, with initial point of RNS chosen by SHD algorithm.

n |wrns| d0 d∗ N trns |wsdp| tsdp 5 1.100e+2 526.1339 477.4742 5 0.0582 1.100e+2 0:00:01 10 8.833e+2 6558.5283 6095.9062 5 0.50071 8.833e+2 0:00:22 15 2.697e+3 26317.7883 24642.9612 6 3.8200 2.697e+3 0:01:18 20 6.237e+3 64267.4747 60435.2823 6 18.2790 6.237e+3 0:22:30

slide-19
SLIDE 19

19/20

Main results and Perspective

Main results

The first Riemannian optimization for the symmetric tensor rank approximation problem over the real and the complex field. Exact computation of the Hessian matrix, and then fast local convergence. A new method to approximate an homogeneous polynomial into power of linear form, with quasi-optimality property. Proposing a non-random method for choosing the initial point.

Perspective

Weierstrass 11 like block decomposition of the Newton iteration for the symmetric tensor rank approximation problem. Better starting point.

  • 11B. Mourrain and O. Ruatta, Relations between roots and coefficients,

interpolation and application to system solving, Journal of Symbolic Computation, 33 (2002), pp. 679–699.

slide-20
SLIDE 20

20/20

For more details concerning this work: R.Khouja, H.Khalil, and B.Mourrain, ”A Riemannian Newton

  • ptimization framework for the symmetric tensor rank

approximation problem” https://hal.archives-ouvertes.fr/hal-02494172 Thank you for your attention Questions ?