polynomial spectral decomposition of conditional
play

Polynomial Spectral Decomposition of Conditional Expectation - PowerPoint PPT Presentation

Polynomial Spectral Decomposition of Conditional Expectation Operators Anuran Makur and Lizhong Zheng EECS Department, Massachusetts Institute of Technology Allerton Conference 2016 A. Makur & L. Zheng (MIT) Polynomial Spectral


  1. Polynomial Spectral Decomposition of Conditional Expectation Operators Anuran Makur and Lizhong Zheng EECS Department, Massachusetts Institute of Technology Allerton Conference 2016 A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 1 / 25

  2. Outline Introduction 1 Motivation: Regression and Maximal Correlation Preliminaries Spectral Characterization of Maximal Correlation Polynomial Decompositions of Compact Operators 2 Illustrations of Polynomial SVDs 3 A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 2 / 25

  3. Motivation: Regression and Maximal Correlation Fix a joint distribution P X , Y on X × Y . A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 3 / 25

  4. Motivation: Regression and Maximal Correlation Fix a joint distribution P X , Y on X × Y . Regression: [Breiman and Friedman, 1985] Find f ⋆ ∈ F and g ⋆ ∈ G that minimize the mean squared error: � ( f ( X ) − g ( Y )) 2 � inf f ∈F , g ∈G E where we minimize over: f 2 ( X ) F � � � � � f : X → R | E [ f ( X )] = 0 , E = 1 g 2 ( Y ) G � � � � � g : Y → R | E [ g ( Y )] = 0 , E = 1 A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 3 / 25

  5. Motivation: Regression and Maximal Correlation Fix a joint distribution P X , Y on X × Y . Regression: [Breiman and Friedman, 1985] Find f ⋆ ∈ F and g ⋆ ∈ G that minimize the mean squared error: � ( f ( X ) − g ( Y )) 2 � inf f ∈F , g ∈G E where we minimize over: f 2 ( X ) F � � � � � f : X → R | E [ f ( X )] = 0 , E = 1 g 2 ( Y ) G � � � � � g : Y → R | E [ g ( Y )] = 0 , E = 1 Maximal Correlation: [R´ enyi, 1959] Find f ⋆ ∈ F and g ⋆ ∈ G that maximize the correlation: ρ ( X ; Y ) � sup E [ f ( X ) g ( Y )] f ∈F , g ∈G Equivalence: E [( f ( X ) − g ( Y )) 2 ] = 2 − 2 E [ f ( X ) g ( Y )] A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 3 / 25

  6. Motivation: Regression and Maximal Correlation Fix a joint distribution P X , Y on X × Y . Regression: [Breiman and Friedman, 1985] Find f ⋆ ∈ F and g ⋆ ∈ G that minimize the mean squared error: � ( f ( X ) − g ( Y )) 2 � inf f ∈F , g ∈G E where we minimize over: f 2 ( X ) F � � � � � f : X → R | E [ f ( X )] = 0 , E = 1 g 2 ( Y ) G � � � � � g : Y → R | E [ g ( Y )] = 0 , E = 1 Maximal Correlation: [R´ enyi, 1959] Find f ⋆ ∈ F and g ⋆ ∈ G that maximize the correlation: ρ ( X ; Y ) � sup E [ f ( X ) g ( Y )] f ∈F , g ∈G Equivalence: E [( f ( X ) − g ( Y )) 2 ] = 2 − 2 E [ f ( X ) g ( Y )] Maximal correlation is a singular value of an operator! A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 3 / 25

  7. Preliminaries Source random variable X ∈ X ⊆ R with probability density P X on the measure space ( X , B ( X ) , λ ) A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 4 / 25

  8. Preliminaries Source random variable X ∈ X ⊆ R with probability density P X on the measure space ( X , B ( X ) , λ ) Output random variable Y ∈ Y ⊆ R A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 4 / 25

  9. Preliminaries Source random variable X ∈ X ⊆ R with probability density P X on the measure space ( X , B ( X ) , λ ) Output random variable Y ∈ Y ⊆ R � P Y | X = x : x ∈ X � Channel conditional probability densities on the measure space ( Y , B ( Y ) , µ ). A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 4 / 25

  10. Preliminaries Source random variable X ∈ X ⊆ R with probability density P X on the measure space ( X , B ( X ) , λ ) Output random variable Y ∈ Y ⊆ R � P Y | X = x : x ∈ X � Channel conditional probability densities on the measure space ( Y , B ( Y ) , µ ). Marginal probability laws: P X and P Y A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 4 / 25

  11. Preliminaries Hilbert spaces: L 2 ( X , P X ) � f 2 ( X ) � � � � f : X → R | E < + ∞ L 2 ( Y , P Y ) � g 2 ( Y ) � � � � g : Y → R | E < + ∞ ℒ 2 𝒵, ℙ 𝑍 𝑔 𝑔 2 1 0 0 𝑕 2 𝑕 1 ℒ 2 𝒴, ℙ 𝑌 A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 5 / 25

  12. Preliminaries Hilbert spaces: L 2 ( X , P X ) � f 2 ( X ) � � � � f : X → R | E < + ∞ L 2 ( Y , P Y ) � g 2 ( Y ) � � � � g : Y → R | E < + ∞ ℒ 2 𝒵, ℙ 𝑍 𝑔 𝑔 2 1 0 0 𝑕 2 𝑕 1 ℒ 2 𝒴, ℙ 𝑌 � f 1 , f 2 � P X � E [ f 1 ( X ) f 2 ( X )] � g 1 , g 2 � P Y � E [ g 1 ( Y ) g 2 ( Y )] Correlation as inner products A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 5 / 25

  13. Preliminaries Hilbert spaces: L 2 ( X , P X ) � f 2 ( X ) � � � � f : X → R | E < + ∞ L 2 ( Y , P Y ) � g 2 ( Y ) � � � � g : Y → R | E < + ∞ ℒ 2 𝒵, ℙ 𝑍 𝑔 𝐷 0 0 𝑕 𝐷 ∗ ℒ 2 𝒴, ℙ 𝑌 Conditional Expectation Operators: C : L 2 ( X , P X ) → L 2 ( Y , P Y ): ( C ( f ))( y ) � E [ f ( X ) | Y = y ] C ∗ : L 2 ( Y , P Y ) → L 2 ( X , P X ): ( C ∗ ( g ))( x ) � E [ g ( Y ) | X = x ] A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 6 / 25

  14. 1 𝒴 1 𝒵 𝐷 0 0 ℒ 2 𝒴, ℙ 𝑌 ℒ 2 𝒵, ℙ 𝑍 𝑔 𝐷(𝑔) Preliminaries Proposition (Conditional Expectation Operators) C and C ∗ are bounded linear operators with operator norms � C � op = � C ∗ � op = 1. Moreover, C ∗ is the adjoint operator of C . A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 7 / 25

  15. 1 𝒴 1 𝒵 𝐷 0 0 ℒ 2 𝒴, ℙ 𝑌 ℒ 2 𝒵, ℙ 𝑍 𝑔 𝐷(𝑔) Preliminaries Proposition (Conditional Expectation Operators) C and C ∗ are bounded linear operators with operator norms � C � op = � C ∗ � op = 1. Moreover, C ∗ is the adjoint operator of C . � C ( f ) � P Y Operator Norm: � C � op � sup � f � P X f ∈L 2 ( X , P X ) A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 7 / 25

  16. Preliminaries Proposition (Conditional Expectation Operators) C and C ∗ are bounded linear operators with operator norms � C � op = � C ∗ � op = 1. Moreover, C ∗ is the adjoint operator of C . � C ( f ) � P Y Operator Norm: � C � op � sup � f � P X f ∈L 2 ( X , P X ) � C � op ≤ 1 by Jensen’s inequality: � E [ f ( X ) | Y ] 2 � � C ( f ) � 2 = � f � 2 f 2 ( X ) | Y � � �� P Y = E ≤ E P X . E 1 𝒴 1 𝒵 𝐷 0 0 ℒ 2 𝒴, ℙ 𝑌 ℒ 2 𝒵, ℙ 𝑍 𝑔 𝐷(𝑔) A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 7 / 25

  17. Preliminaries Proposition (Conditional Expectation Operators) C and C ∗ are bounded linear operators with operator norms � C � op = � C ∗ � op = 1. Moreover, C ∗ is the adjoint operator of C . � C ( f ) � P Y Operator Norm: � C � op � sup � f � P X f ∈L 2 ( X , P X ) � C � op ≤ 1 by Jensen’s inequality. Let 1 S : S → R denote the everywhere unity function: 1 S ( x ) = 1. C ( 1 X ) = 1 Y and � 1 X � 2 P X = � 1 Y � 2 P Y = 1 ⇒ � C � op = 1. 1 𝒴 1 𝒵 𝐷 0 0 ℒ 2 𝒴, ℙ 𝑌 ℒ 2 𝒵, ℙ 𝑍 𝐷(𝑔) 𝑔 A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 8 / 25

  18. 1 𝒴 1 𝒵 𝐷 𝐷(𝑔 ⋆ ) 0 0 ℒ 2 𝒴, ℙ 𝑌 ℒ 2 𝒵, ℙ 𝑍 𝑔 ⋆ 𝐷 ∗ 𝜍 2 Spectral Characterization of Maximal Correlation Prop (Spectral Characterization of Maximal Correlation) [R´ enyi, 1959] For random variables X and Y as defined earlier: � C ( f ) � P Y ρ ( X ; Y ) = sup � f � P X f ∈L 2 ( X , P X ): E [ f ( X )]=0 where the supremum is achieved by some f ⋆ ∈ L 2 ( X , P X ) if C is compact. A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 9 / 25

  19. Spectral Characterization of Maximal Correlation Prop (Spectral Characterization of Maximal Correlation) [R´ enyi, 1959] For random variables X and Y as defined earlier: � C ( f ) � P Y ρ ( X ; Y ) = sup � f � P X f ∈L 2 ( X , P X ): E [ f ( X )]=0 where the supremum is achieved by some f ⋆ ∈ L 2 ( X , P X ) if C is compact. 1 𝒴 1 𝒵 𝐷 𝐷(𝑔 ⋆ ) 0 0 ℒ 2 𝒴, ℙ 𝑌 ℒ 2 𝒵, ℙ 𝑍 𝑔 ⋆ 𝐷 ∗ / 𝜍 2 C has largest singular value � C � op = 1: C ( 1 X ) = 1 Y , C ∗ ( 1 Y ) = 1 X . A. Makur & L. Zheng (MIT) Polynomial Spectral Decomposition Allerton Conference 2016 9 / 25

Recommend


More recommend