Nonlinear Signal Processing 2007-2008 Course Overview Instituto - PowerPoint PPT Presentation

Nonlinear Signal Processing 2007-2008 Course Overview Instituto Superior T´ ecnico, Lisbon, Portugal Jo˜ ao Xavier jxavier@isr.ist.utl.pt

Introduction • This course is about applications of differential geometry in signal processing • What is differential geometry ? − generalization of differential calculus to manifolds • What is a manifold ? − smooth curved set − no vector space structure, no canonical coordinate system − looks locally like an Euclidean space, but not globally

Introduction • General idea Manifold Manifold Not a manifold

Introduction • Example: graph of f ( x, y ) = 1 − x 2 − y 2 R 3 { ( x, y, z ) : z = f ( x, y ) }

Introduction • Example: n × n orthogonal matrices R n × n { X : X ⊤ X = I n }

Introduction • Example: n × m matrices with rank r R n × m { X : rank X = r } • Note: n × m matrices with rank ≤ r is not a manifold

Introduction • Example: n × m matrices with prescribed singular values s i R n × m { X : σ i ( X ) = s i }

Introduction • Example: n × n symmetric matrices s.t. λ max has multiplicity k R n × n : λ 1 ( X ) = · · · = λ k ( X ) > λ k +1 ( X ) }

Introduction • Not all manifolds are “naturally” embedded in an Euclidean space • Example: set of k -dimensional subspaces in R n (Grassmann manifold) R n Manifold

Introduction • How is differential geometry useful ? − systematic framework for nonlinear problems (generalizes linear algebra) − elegant geometric re-interpretations of existing solutions • Karmakar’s algorithm for linear programming • Sequential Quadratic Programming methods in optimization • Rao distance between pdf’s in parametric statistical families • Jeffrey’s noninformative prior in Bayesian setups • Cram´ er-Rao bound for parametric estimation with ambiguities • ... many more − suggests new powerful solutions

Introduction • Where has differential geometry been applied ? − Optimization on manifolds − Kendall’s theory of shapes − Random matrix theory − Information geometry − Geometrical interpretation of Jeffreys’ prior − Performance bounds for estimation problems posed on manifolds − Doing statistics on manifolds (generalized PCA) − ... a lot more (signal processing, econometrics, control, etc)

Application: optimization on manifolds • Unconstrained problem x ∈ R n f ( x ) min • Line-search algorithm: x k +1 = x k + α k d k x k +1 d k x k • d k = −∇ f ( x k ) [gradient], d k = −∇ 2 f ( x k ) − 1 ∇ f ( x k ) [Newton], others . . .

Application: optimization on manifolds • Constrained problem x ∈ M f ( x ) min • Re-interpreted as an unconstrained problem on manifold M • Geodesic-search algorithm: x k +1 = exp x k ( α k d k ) d k x k x k +1 M

Application: optimization on manifolds • Works for abstract spaces (e.g. Grassmann manifold) • Theory provides generalization of gradient, Newton direction (not obvious) • Closed-form solutions for important manifolds (e.g. orthogonal matrices) • Geodesic-search is not the only possibility: − optimization in local coordinates − generalization of trust-region methods • Innumerous applications : − blind source separation, image processing, rank-reduced Wiener filter,. . .

Application: optimization on manifolds • Example: Signal model y [ t ] = Qx [ t ] + w [ t ] t = 1 , 2 , . . . , T − Q : unknown orthogonal matrix ( Q ⊤ Q = I N ) − x [ t ] : known landmarks − w [ t ] iid ∼ N (0 , Σ) • Maximum-Likelihood estimate: Q ∗ = arg Q ∈ O ( N ) p ( Y ; Q ) max − O ( N ) = group of N × N orthogonal matrices � � − Y = matrix of observations y [1] y [2] · · · y [ T ] � � − X = matrix of landmarks x [1] x [2] · · · x [ T ]

Application: optimization on manifolds • Optimization problem: Orthogonal Procrustes rotation Q ∗ Q ∈ O ( N ) � Y − QX � 2 = arg min Σ − 1 � � � � Q T Σ − 1 Q � Q T Σ − 1 � = arg Q ∈ O ( N ) tr min R xx − tr R yx � T � T t =1 y [ t ] x [ t ] ⊤ and � − � R yx = 1 R xx = 1 t =1 x [ t ] x [ t ] ⊤ T T • The eigenstructure of Σ controls the Hessian of the objective: κ (Σ − 1 ) = λ max (Σ − 1 ) λ min (Σ − 1 ) is the condition number of Σ − 1

Application: optimization on manifolds • Example: N = 5 , T = 100 , Σ = diag (1 , 1 , 1 , 1 , 1) , κ (Σ − 1 ) = 1 2 10 1 10 0 10 −1 10 −2 10 −3 10 0 5 10 15 20 25 30 Iteration ◦ =projected gradient � =gradient geodesic descent ⋄ =Newton geodesic descent

Application: optimization on manifolds • Example: N = 5 , T = 100 , Σ = diag (0 . 2 , 0 . 4 , 0 . 6 , 0 . 8 , 1) , κ (Σ − 1 ) = 5 2 10 1 10 0 10 −1 10 −2 10 −3 10 0 5 10 15 20 25 30 Iteration ◦ =projected gradient � =gradient geodesic descent ⋄ =Newton geodesic descent

Application: optimization on manifolds • Example: N = 5 , T = 100 , Σ = diag (0 . 02 , 0 . 05 , 0 . 14 , 0 . 37 , 1) , κ (Σ − 1 ) = 50 3 10 2 10 1 10 0 10 −1 10 −2 10 0 5 10 15 20 25 30 Iteration ◦ =projected gradient � =gradient geodesic descent ⋄ =Newton geodesic descent

Application: Kendall’s theory of shapes Manifold (quotient space) • Applications: − Morph one shape into another, statistics (“mean” shape), clustering, . . .

Application: random matrix theory • Basic statistics: transformation of random objects in Euclidean spaces   x is a random vector in R n     y ∼ p Y ( y ) = p X ( F − 1 ( y )) J ( y )  x ∼ p X ( x ) ⇒ 1 F : R n → R n smooth, bijective  J ( y ) =  det( DF ( F − 1 ( y )))     y = F ( x ) p X p Y F R n R n

Application: random matrix theory • Generalization: transformation of random objects in manifolds M, N   x is a random point in M      x ∼ Ω X (exterior form) ⇒ y ∼ Ω Y = . . .  F : M → N smooth, bijective      y = F ( x ) • The answer is provided by the calculus of exterior differential forms Ω X Ω Y F M N

Application: random matrix theory • Example: decoupling a random vector in amplitude and direction � � x F ( x ) = � x � , � x � M = R n − { 0 } N = R ++ × S n − 1 = { ( R, u ) : R > 0 , � u � = 1 } p ( R, u ) = p X ( Ru ) R n − 1 • Answer: x ∼ p X ( x ) ⇒

Application: random matrix theory • Example: decoupling a random matrix by the polar decomposition X = PQ Polar decomposition N = S n ++ × O ( n ) M = GL ( n ) � � � � X ∈ R n × n : | X | � = 0 ( P, Q ) : P ≻ 0 , Q ⊤ Q = I n = = • Answer: X ∼ p X ( X ) ⇒ p ( P, Q ) = . . . (known)

Application: random matrix theory • Example: decoupling a random symmetric matrix by eigendecomposition X = Q Λ Q ⊤ EVD M = S n N = O ( n ) × D ( n ) � X ∈ R n × n : X = X ⊤ � � � ( Q, Λ) : Q ⊤ Q = I n , Λ : diag = = • Answer: X ∼ p X ( X ) ⇒ p ( Q, Λ) = . . . (known) • Technicality: in fact, the range of F is a quotient of an open subset of N

Application: random matrix theory • Many more examples: − Cholesky decomposition (e.g., leads to Wishart distribution) − LU − QR − SVD

Application of RMT: coherent capacity of multi-antenna systems • Scenario: point-to-point single-user communication with multiple Tx antennas h 11 y 1 h 1 ,N t x 1 h 21 y 2 h N r , 1 � b Tx Rx b h N r ,N t x N t y N r

Application of RMT: coherent capacity of multi-antenna systems • Data model: y = Hx + n with y, n ∈ C N r , H ∈ C N r × N t , x ∈ C N t − N t = number of Tx antennas − N r = number of Rx antennas iid Assumption: n i ∼ C N (0 , 1) • Decoupled data model: − SVD: H = U Σ V H with U ∈ U ( N r ) , V ∈ U ( N t ) , Σ = Diag ( σ 1 , . . . , σ f , 0) , ( σ 1 , . . . , σ f ) = nonzero singular values of H , f = min { N r , N t } y = U H y , � x = V H x and � n = U H n − Transform the data: � − Equivalent diagonal model: � y = Σ � x + � n

Application of RMT: coherent capacity of multi-antenna systems • Interpretation: The matrix channel H is equivalent to f parallel scalar channels n 1 � σ 1 y 1 x 1 � � + � n f σ f x f y f � � +

Application of RMT: coherent capacity of multi-antenna systems • Assumption: channel matrix H is random and known only at the Rx • Channel capacity: C = max I ( x ; ( y, H )) p ( x ) , E { � x � 2 ≤ P } I = mutual information • Solution:     f � � � 1 + ( P/N t ) σ 2 C = E H log i   i =1 Recall: ( σ 1 , . . . , σ f ) = random singular values of H , f = min { N r , N t }

Application of RMT: coherent capacity of multi-antenna systems • H is random and H = U Σ V H (SVD) p ( H ) p ( U, Σ , V ) SVD C N r × N t U ( N r ) × D ( f ) × U ( N t ) • Capacity: when [ H ij ] iid ∼ C N (0 , 1) � ∞ f − 1 � k ! ( λ )) 2 λ g − f e − λ dλ ( k + g − f )! ( L g − f C = log(1 + ( P/N t ) λ ) k 0 k =0 g = max { N r , N t } and L i j =Laguerre polynomials

Application: information geometry • Problem: given a parametric statistical family F = { p ( x ; θ ) : θ ∈ Θ } assign a distance function d : F × F → R • Example: F = {N ( θ, Σ) : θ ∈ Θ = R n } (covariance Σ is fixed) • Naive choice: d : Θ × Θ → R d ( θ, η ) = � θ − η � η θ • This method does not produce “intrinsic” distances (parameter invariant)

Nonlinear Signal Processing 2007-2008 Course Overview Instituto - PowerPoint PPT Presentation

Nonlinear Signal Processing 2007-2008 Course Overview Instituto Superior T ecnico, Lisbon, Portugal Jo ao Xavier jxavier@isr.ist.utl.pt Introduction This course is about applications of differential geometry in signal processing

Signal Processing - Introduction Signal Processing Analogue/digital filters: extensively used

Digital Signal Processing Solutions Digital Signal Processing Solutions SIGNAL PROCESSING

Nonlinear Control Lecture # 31 Nonlinear Observers Nonlinear Control Lecture # 31 Nonlinear

Nonlinear Control Lecture # 22 Special nonlinear Forms Nonlinear Control Lecture # 22 Special

Nonlinear Control Lecture # 21 Special nonlinear Forms Nonlinear Control Lecture # 21 Special

Speech Processing 15-492/18-492 Speech Synthesis Signal Processing Signal Manipulation Signal

Tx Signal: 1000 Hz sine wave; Attenuation; Random noise with 0.5ms spike Tx Signal Noise Rx

Waveform Generation Fundamental part of signal processing is the signal. Within the

OFDM Signal Navigation NAV 2008 2 OFDM Signal Navigation NAV 2008 3 OFDM Signal Navigation

Nonlinear Control Lecture # 8 Special nonlinear Forms Nonlinear Control Lecture # 8 Special

Nonlinear Control Lecture # 12 Nonlinear Observers and Output Feedback Stabilization Nonlinear

Nonlinear Control Lecture # 20 Special nonlinear Forms Nonlinear Control Lecture # 20 Special

Advanced Digital Signal Processing Part 5: Multi-Rate Digital Signal Processing Gerhard Schmidt

VLSI Digital Signal Processing Systems Keshab K. Parhi VLSI Digital Signal Processing Systems

Signal Processing in MATLAB Signal Processing in MATLAB February 2, 1998 Tom Krauss PhD Student

Efficient audio signal processing using LLVM and Haskell Henning Thielemann 2013-04-30

Superadditivity of Fisher Information: Classical vs. Quantum Shunlong Luo Academy of Mathematics

Statistical methods for neural decoding Liam Paninski Gatsby Computational Neuroscience Unit

LAN property for diffusion processes with jumps with discrete observations Arturo Kohatsu-Higa

On comparison and improvement of estimators based on likelihood Aleksander Zaigrajew Nicolaus

Institute of Telecommunications (ITK) Joint Localiza,on Algorithms for Network

Construction of some special classes of stable processes that generalises spatial or temporal

Totally Unimodular Matrices m constraints, n variables Vertex solution: unique solution of n

On the Amortized Complexity of Zero Knowledge Protocols for Multiplicative Relations Ronald Cramer