Nonlinear Signal Processing 2007-2008 Course Overview Instituto Superior T´ ecnico, Lisbon, Portugal Jo˜ ao Xavier jxavier@isr.ist.utl.pt
Introduction • This course is about applications of differential geometry in signal processing • What is differential geometry ? − generalization of differential calculus to manifolds • What is a manifold ? − smooth curved set − no vector space structure, no canonical coordinate system − looks locally like an Euclidean space, but not globally
Introduction • General idea Manifold Manifold Not a manifold
Introduction • Example: graph of f ( x, y ) = 1 − x 2 − y 2 R 3 { ( x, y, z ) : z = f ( x, y ) }
Introduction • Example: n × n orthogonal matrices R n × n { X : X ⊤ X = I n }
Introduction • Example: n × m matrices with rank r R n × m { X : rank X = r } • Note: n × m matrices with rank ≤ r is not a manifold
Introduction • Example: n × m matrices with prescribed singular values s i R n × m { X : σ i ( X ) = s i }
Introduction • Example: n × n symmetric matrices s.t. λ max has multiplicity k R n × n : λ 1 ( X ) = · · · = λ k ( X ) > λ k +1 ( X ) }
Introduction • Not all manifolds are “naturally” embedded in an Euclidean space • Example: set of k -dimensional subspaces in R n (Grassmann manifold) R n Manifold
Introduction • How is differential geometry useful ? − systematic framework for nonlinear problems (generalizes linear algebra) − elegant geometric re-interpretations of existing solutions • Karmakar’s algorithm for linear programming • Sequential Quadratic Programming methods in optimization • Rao distance between pdf’s in parametric statistical families • Jeffrey’s noninformative prior in Bayesian setups • Cram´ er-Rao bound for parametric estimation with ambiguities • ... many more − suggests new powerful solutions
Introduction • Where has differential geometry been applied ? − Optimization on manifolds − Kendall’s theory of shapes − Random matrix theory − Information geometry − Geometrical interpretation of Jeffreys’ prior − Performance bounds for estimation problems posed on manifolds − Doing statistics on manifolds (generalized PCA) − ... a lot more (signal processing, econometrics, control, etc)
Application: optimization on manifolds • Unconstrained problem x ∈ R n f ( x ) min • Line-search algorithm: x k +1 = x k + α k d k x k +1 d k x k • d k = −∇ f ( x k ) [gradient], d k = −∇ 2 f ( x k ) − 1 ∇ f ( x k ) [Newton], others . . .
Application: optimization on manifolds • Constrained problem x ∈ M f ( x ) min • Re-interpreted as an unconstrained problem on manifold M • Geodesic-search algorithm: x k +1 = exp x k ( α k d k ) d k x k x k +1 M
Application: optimization on manifolds • Works for abstract spaces (e.g. Grassmann manifold) • Theory provides generalization of gradient, Newton direction (not obvious) • Closed-form solutions for important manifolds (e.g. orthogonal matrices) • Geodesic-search is not the only possibility: − optimization in local coordinates − generalization of trust-region methods • Innumerous applications : − blind source separation, image processing, rank-reduced Wiener filter,. . .
Application: optimization on manifolds • Example: Signal model y [ t ] = Qx [ t ] + w [ t ] t = 1 , 2 , . . . , T − Q : unknown orthogonal matrix ( Q ⊤ Q = I N ) − x [ t ] : known landmarks − w [ t ] iid ∼ N (0 , Σ) • Maximum-Likelihood estimate: Q ∗ = arg Q ∈ O ( N ) p ( Y ; Q ) max − O ( N ) = group of N × N orthogonal matrices � � − Y = matrix of observations y [1] y [2] · · · y [ T ] � � − X = matrix of landmarks x [1] x [2] · · · x [ T ]
Application: optimization on manifolds • Optimization problem: Orthogonal Procrustes rotation Q ∗ Q ∈ O ( N ) � Y − QX � 2 = arg min Σ − 1 � � � � Q T Σ − 1 Q � Q T Σ − 1 � = arg Q ∈ O ( N ) tr min R xx − tr R yx � T � T t =1 y [ t ] x [ t ] ⊤ and � − � R yx = 1 R xx = 1 t =1 x [ t ] x [ t ] ⊤ T T • The eigenstructure of Σ controls the Hessian of the objective: κ (Σ − 1 ) = λ max (Σ − 1 ) λ min (Σ − 1 ) is the condition number of Σ − 1
Application: optimization on manifolds • Example: N = 5 , T = 100 , Σ = diag (1 , 1 , 1 , 1 , 1) , κ (Σ − 1 ) = 1 2 10 1 10 0 10 −1 10 −2 10 −3 10 0 5 10 15 20 25 30 Iteration ◦ =projected gradient � =gradient geodesic descent ⋄ =Newton geodesic descent
Application: optimization on manifolds • Example: N = 5 , T = 100 , Σ = diag (0 . 2 , 0 . 4 , 0 . 6 , 0 . 8 , 1) , κ (Σ − 1 ) = 5 2 10 1 10 0 10 −1 10 −2 10 −3 10 0 5 10 15 20 25 30 Iteration ◦ =projected gradient � =gradient geodesic descent ⋄ =Newton geodesic descent
Application: optimization on manifolds • Example: N = 5 , T = 100 , Σ = diag (0 . 02 , 0 . 05 , 0 . 14 , 0 . 37 , 1) , κ (Σ − 1 ) = 50 3 10 2 10 1 10 0 10 −1 10 −2 10 0 5 10 15 20 25 30 Iteration ◦ =projected gradient � =gradient geodesic descent ⋄ =Newton geodesic descent
Application: Kendall’s theory of shapes Manifold (quotient space) • Applications: − Morph one shape into another, statistics (“mean” shape), clustering, . . .
Application: random matrix theory • Basic statistics: transformation of random objects in Euclidean spaces x is a random vector in R n y ∼ p Y ( y ) = p X ( F − 1 ( y )) J ( y ) x ∼ p X ( x ) ⇒ 1 F : R n → R n smooth, bijective J ( y ) = det( DF ( F − 1 ( y ))) y = F ( x ) p X p Y F R n R n
Application: random matrix theory • Generalization: transformation of random objects in manifolds M, N x is a random point in M x ∼ Ω X (exterior form) ⇒ y ∼ Ω Y = . . . F : M → N smooth, bijective y = F ( x ) • The answer is provided by the calculus of exterior differential forms Ω X Ω Y F M N
Application: random matrix theory • Example: decoupling a random vector in amplitude and direction � � x F ( x ) = � x � , � x � M = R n − { 0 } N = R ++ × S n − 1 = { ( R, u ) : R > 0 , � u � = 1 } p ( R, u ) = p X ( Ru ) R n − 1 • Answer: x ∼ p X ( x ) ⇒
Application: random matrix theory • Example: decoupling a random matrix by the polar decomposition X = PQ Polar decomposition N = S n ++ × O ( n ) M = GL ( n ) � � � � X ∈ R n × n : | X | � = 0 ( P, Q ) : P ≻ 0 , Q ⊤ Q = I n = = • Answer: X ∼ p X ( X ) ⇒ p ( P, Q ) = . . . (known)
Application: random matrix theory • Example: decoupling a random symmetric matrix by eigendecomposition X = Q Λ Q ⊤ EVD M = S n N = O ( n ) × D ( n ) � X ∈ R n × n : X = X ⊤ � � � ( Q, Λ) : Q ⊤ Q = I n , Λ : diag = = • Answer: X ∼ p X ( X ) ⇒ p ( Q, Λ) = . . . (known) • Technicality: in fact, the range of F is a quotient of an open subset of N
Application: random matrix theory • Many more examples: − Cholesky decomposition (e.g., leads to Wishart distribution) − LU − QR − SVD
Application of RMT: coherent capacity of multi-antenna systems • Scenario: point-to-point single-user communication with multiple Tx antennas h 11 y 1 h 1 ,N t x 1 h 21 y 2 h N r , 1 � b Tx Rx b h N r ,N t x N t y N r
Application of RMT: coherent capacity of multi-antenna systems • Data model: y = Hx + n with y, n ∈ C N r , H ∈ C N r × N t , x ∈ C N t − N t = number of Tx antennas − N r = number of Rx antennas iid Assumption: n i ∼ C N (0 , 1) • Decoupled data model: − SVD: H = U Σ V H with U ∈ U ( N r ) , V ∈ U ( N t ) , Σ = Diag ( σ 1 , . . . , σ f , 0) , ( σ 1 , . . . , σ f ) = nonzero singular values of H , f = min { N r , N t } y = U H y , � x = V H x and � n = U H n − Transform the data: � − Equivalent diagonal model: � y = Σ � x + � n
Application of RMT: coherent capacity of multi-antenna systems • Interpretation: The matrix channel H is equivalent to f parallel scalar channels n 1 � σ 1 y 1 x 1 � � + � n f σ f x f y f � � +
Application of RMT: coherent capacity of multi-antenna systems • Assumption: channel matrix H is random and known only at the Rx • Channel capacity: C = max I ( x ; ( y, H )) p ( x ) , E { � x � 2 ≤ P } I = mutual information • Solution: f � � � 1 + ( P/N t ) σ 2 C = E H log i i =1 Recall: ( σ 1 , . . . , σ f ) = random singular values of H , f = min { N r , N t }
Application of RMT: coherent capacity of multi-antenna systems • H is random and H = U Σ V H (SVD) p ( H ) p ( U, Σ , V ) SVD C N r × N t U ( N r ) × D ( f ) × U ( N t ) • Capacity: when [ H ij ] iid ∼ C N (0 , 1) � ∞ f − 1 � k ! ( λ )) 2 λ g − f e − λ dλ ( k + g − f )! ( L g − f C = log(1 + ( P/N t ) λ ) k 0 k =0 g = max { N r , N t } and L i j =Laguerre polynomials
Application: information geometry • Problem: given a parametric statistical family F = { p ( x ; θ ) : θ ∈ Θ } assign a distance function d : F × F → R • Example: F = {N ( θ, Σ) : θ ∈ Θ = R n } (covariance Σ is fixed) • Naive choice: d : Θ × Θ → R d ( θ, η ) = � θ − η � η θ • This method does not produce “intrinsic” distances (parameter invariant)
Recommend
More recommend