Polynomial Spectral Decomposition of Conditional Expectation - PowerPoint PPT Presentation

Polynomial Spectral Decomposition of Conditional Expectation Operators Anuran Makur and Lizhong Zheng EECS Department, Massachusetts Institute of Technology Allerton Conference 2016 A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 1 / 25

Outline Introduction 1 Motivation: Regression and Maximal Correlation Preliminaries Spectral Characterization of Maximal Correlation Polynomial Decompositions of Compact Operators 2 Illustrations of Polynomial SVDs 3 A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 2 / 25

Motivation: Regression and Maximal Correlation Fix a joint distribution P X , Y on X × Y . A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 3 / 25

Motivation: Regression and Maximal Correlation Fix a joint distribution P X , Y on X × Y . Regression: [Breiman and Friedman, 1985] Find f ⋆ ∈ F and g ⋆ ∈ G that minimize the mean squared error: � ( f ( X ) − g ( Y )) 2 � inf f ∈F , g ∈G E where we minimize over: f 2 ( X ) F � � � � � f : X → R | E [ f ( X )] = 0 , E = 1 g 2 ( Y ) G � � � � � g : Y → R | E [ g ( Y )] = 0 , E = 1 A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 3 / 25

Motivation: Regression and Maximal Correlation Fix a joint distribution P X , Y on X × Y . Regression: [Breiman and Friedman, 1985] Find f ⋆ ∈ F and g ⋆ ∈ G that minimize the mean squared error: � ( f ( X ) − g ( Y )) 2 � inf f ∈F , g ∈G E where we minimize over: f 2 ( X ) F � � � � � f : X → R | E [ f ( X )] = 0 , E = 1 g 2 ( Y ) G � � � � � g : Y → R | E [ g ( Y )] = 0 , E = 1 Maximal Correlation: [R´ enyi, 1959] Find f ⋆ ∈ F and g ⋆ ∈ G that maximize the correlation: ρ ( X ; Y ) � sup E [ f ( X ) g ( Y )] f ∈F , g ∈G Equivalence: E [( f ( X ) − g ( Y )) 2 ] = 2 − 2 E [ f ( X ) g ( Y )] A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 3 / 25

Motivation: Regression and Maximal Correlation Fix a joint distribution P X , Y on X × Y . Regression: [Breiman and Friedman, 1985] Find f ⋆ ∈ F and g ⋆ ∈ G that minimize the mean squared error: � ( f ( X ) − g ( Y )) 2 � inf f ∈F , g ∈G E where we minimize over: f 2 ( X ) F � � � � � f : X → R | E [ f ( X )] = 0 , E = 1 g 2 ( Y ) G � � � � � g : Y → R | E [ g ( Y )] = 0 , E = 1 Maximal Correlation: [R´ enyi, 1959] Find f ⋆ ∈ F and g ⋆ ∈ G that maximize the correlation: ρ ( X ; Y ) � sup E [ f ( X ) g ( Y )] f ∈F , g ∈G Equivalence: E [( f ( X ) − g ( Y )) 2 ] = 2 − 2 E [ f ( X ) g ( Y )] Maximal correlation is a singular value of an operator! A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 3 / 25

Preliminaries Source random variable X ∈ X ⊆ R with probability density P X on the measure space ( X , B ( X ) , λ ) A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 4 / 25

Preliminaries Source random variable X ∈ X ⊆ R with probability density P X on the measure space ( X , B ( X ) , λ ) Output random variable Y ∈ Y ⊆ R A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 4 / 25

Preliminaries Source random variable X ∈ X ⊆ R with probability density P X on the measure space ( X , B ( X ) , λ ) Output random variable Y ∈ Y ⊆ R � P Y | X = x : x ∈ X � Channel conditional probability densities on the measure space ( Y , B ( Y ) , µ ). A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 4 / 25

Preliminaries Source random variable X ∈ X ⊆ R with probability density P X on the measure space ( X , B ( X ) , λ ) Output random variable Y ∈ Y ⊆ R � P Y | X = x : x ∈ X � Channel conditional probability densities on the measure space ( Y , B ( Y ) , µ ). Marginal probability laws: P X and P Y A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 4 / 25

Preliminaries Hilbert spaces: L 2 ( X , P X ) � f 2 ( X ) � � � � f : X → R | E < + ∞ L 2 ( Y , P Y ) � g 2 ( Y ) � � � � g : Y → R | E < + ∞ ℒ 2 𝒵, ℙ 𝑍 𝑔 𝑔 2 1 0 0 𝑕 2 𝑕 1 ℒ 2 𝒴, ℙ 𝑌 A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 5 / 25

Preliminaries Hilbert spaces: L 2 ( X , P X ) � f 2 ( X ) � � � � f : X → R | E < + ∞ L 2 ( Y , P Y ) � g 2 ( Y ) � � � � g : Y → R | E < + ∞ ℒ 2 𝒵, ℙ 𝑍 𝑔 𝑔 2 1 0 0 𝑕 2 𝑕 1 ℒ 2 𝒴, ℙ 𝑌 � f 1 , f 2 � P X � E [ f 1 ( X ) f 2 ( X )] � g 1 , g 2 � P Y � E [ g 1 ( Y ) g 2 ( Y )] Correlation as inner products A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 5 / 25

Preliminaries Hilbert spaces: L 2 ( X , P X ) � f 2 ( X ) � � � � f : X → R | E < + ∞ L 2 ( Y , P Y ) � g 2 ( Y ) � � � � g : Y → R | E < + ∞ ℒ 2 𝒵, ℙ 𝑍 𝑔 𝐷 0 0 𝑕 𝐷 ∗ ℒ 2 𝒴, ℙ 𝑌 Conditional Expectation Operators: C : L 2 ( X , P X ) → L 2 ( Y , P Y ): ( C ( f ))( y ) � E [ f ( X ) | Y = y ] C ∗ : L 2 ( Y , P Y ) → L 2 ( X , P X ): ( C ∗ ( g ))( x ) � E [ g ( Y ) | X = x ] A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 6 / 25

1 𝒴 1 𝒵 𝐷 0 0 ℒ 2 𝒴, ℙ 𝑌 ℒ 2 𝒵, ℙ 𝑍 𝑔 𝐷(𝑔) Preliminaries Proposition (Conditional Expectation Operators) C and C ∗ are bounded linear operators with operator norms � C � op = � C ∗ � op = 1. Moreover, C ∗ is the adjoint operator of C . A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 7 / 25

1 𝒴 1 𝒵 𝐷 0 0 ℒ 2 𝒴, ℙ 𝑌 ℒ 2 𝒵, ℙ 𝑍 𝑔 𝐷(𝑔) Preliminaries Proposition (Conditional Expectation Operators) C and C ∗ are bounded linear operators with operator norms � C � op = � C ∗ � op = 1. Moreover, C ∗ is the adjoint operator of C . � C ( f ) � P Y Operator Norm: � C � op � sup � f � P X f ∈L 2 ( X , P X ) A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 7 / 25

Preliminaries Proposition (Conditional Expectation Operators) C and C ∗ are bounded linear operators with operator norms � C � op = � C ∗ � op = 1. Moreover, C ∗ is the adjoint operator of C . � C ( f ) � P Y Operator Norm: � C � op � sup � f � P X f ∈L 2 ( X , P X ) � C � op ≤ 1 by Jensen’s inequality: � E [ f ( X ) | Y ] 2 � � C ( f ) � 2 = � f � 2 f 2 ( X ) | Y � � �� P Y = E ≤ E P X . E 1 𝒴 1 𝒵 𝐷 0 0 ℒ 2 𝒴, ℙ 𝑌 ℒ 2 𝒵, ℙ 𝑍 𝑔 𝐷(𝑔) A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 7 / 25

Preliminaries Proposition (Conditional Expectation Operators) C and C ∗ are bounded linear operators with operator norms � C � op = � C ∗ � op = 1. Moreover, C ∗ is the adjoint operator of C . � C ( f ) � P Y Operator Norm: � C � op � sup � f � P X f ∈L 2 ( X , P X ) � C � op ≤ 1 by Jensen’s inequality. Let 1 S : S → R denote the everywhere unity function: 1 S ( x ) = 1. C ( 1 X ) = 1 Y and � 1 X � 2 P X = � 1 Y � 2 P Y = 1 ⇒ � C � op = 1. 1 𝒴 1 𝒵 𝐷 0 0 ℒ 2 𝒴, ℙ 𝑌 ℒ 2 𝒵, ℙ 𝑍 𝐷(𝑔) 𝑔 A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 8 / 25

1 𝒴 1 𝒵 𝐷 𝐷(𝑔 ⋆ ) 0 0 ℒ 2 𝒴, ℙ 𝑌 ℒ 2 𝒵, ℙ 𝑍 𝑔 ⋆ 𝐷 ∗ 𝜍 2 Spectral Characterization of Maximal Correlation Prop (Spectral Characterization of Maximal Correlation) [R´ enyi, 1959] For random variables X and Y as defined earlier: � C ( f ) � P Y ρ ( X ; Y ) = sup � f � P X f ∈L 2 ( X , P X ): E [ f ( X )]=0 where the supremum is achieved by some f ⋆ ∈ L 2 ( X , P X ) if C is compact. A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 9 / 25

Spectral Characterization of Maximal Correlation Prop (Spectral Characterization of Maximal Correlation) [R´ enyi, 1959] For random variables X and Y as defined earlier: � C ( f ) � P Y ρ ( X ; Y ) = sup � f � P X f ∈L 2 ( X , P X ): E [ f ( X )]=0 where the supremum is achieved by some f ⋆ ∈ L 2 ( X , P X ) if C is compact. 1 𝒴 1 𝒵 𝐷 𝐷(𝑔 ⋆ ) 0 0 ℒ 2 𝒴, ℙ 𝑌 ℒ 2 𝒵, ℙ 𝑍 𝑔 ⋆ 𝐷 ∗ / 𝜍 2 C has largest singular value � C � op = 1: C ( 1 X ) = 1 Y , C ∗ ( 1 Y ) = 1 X . A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 9 / 25

Polynomial Spectral Decomposition of Conditional Expectation - PowerPoint PPT Presentation

Polynomial Spectral Decomposition of Conditional Expectation Operators Anuran Makur and Lizhong Zheng EECS Department, Massachusetts Institute of Technology Allerton Conference 2016 A. Makur & L. Zheng (MIT) Polynomial Spectral

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Thermal decomposition of the Thermal decomposition of the Thermal decomposition of the Thermal

Polar Decomposition of a Matrix Garrett Buffington May 4, 2014 The Polar Decomposition SVD and

Introduction Warping polynomial Span of warping polynomial Span and dealternating number Ayaka

Why Algorithmic and Rigorous Polynomial Approximations? Rigorous Polynomial Approximation =

On Kauffman polynomial of alternating knot and HOMFLY polynomial of its Whitehead double

PATTERN RECOGNITION AND MACHINE LEARNING Polynomial Curve Fitting Sum-of-Squares Error Function 0

Property of the interior polynomial from the HOMFLY polynomial

SEG Spring 2005 Distinguished Lecture: Spectral Decomposition and Spectral Inversion Greg

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

[11] The Singular Value Decomposition The Singular Value Decomposition Gene Golubs license

CORE DECOMPOSITION AND DENSEST SUBGRAPH IN MULTILAYER NETWORKS CORE DECOMPOSITION AND DENSEST

Stochastic Computing by Stochastic Computing by a New Polynomial a New Polynomial Dimensional

Polynomial Resultants Henry Woody May 2, 2016 The Resultant Polynomial Resultants Henry Woody

Session Types as a Descriptive Tool for Distributed Protocols Nobuko Yoshida Raymond Hu

Theano A short practical guide Emmanuel Bengio folinoid.com What is Theano? A language A

Analog Input/Output Subsystem Design Reference: STM32F4xx Reference Manual (ADC, DAC chapters)

CLIENT INITIATED BACKCHANNEL AUTHENTICATION CIBA? CIBA is an authentication flow like OpenID

Access control types for agents Rohit Chadha and Matthew Hennessy University of Sussex Access

Lecture 1: Verification of Concurrent Programs Part 1: Decidability and Complexity Results Ahmed

Op#misa#on in a Process Engineering Context Eva Sorensen

Trajectory Op-miza-on for Mo-on Planning Pieter Abbeel UC

Sambuz

Useful Links

Newsletter

Mail Us