On Estimation of Modal Decompositions Anuran Makur, Gregory W. Wornell, and Lizhong Zheng Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology IEEE International Symposium on Information Theory 2020 A. Makur, G. W. Wornell, L. Zheng (MIT) On Estimation of Modal Decompositions ISIT 21-26 June 2020 1 / 21
Outline Introduction 1 A Brief History of Modal Decompositions Formal Definitions Motivation: Embedding of Categorical Data into Euclidean Space Characterization of Operators 2 Sample Complexity Analysis 3 Conclusion 4 A. Makur, G. W. Wornell, L. Zheng (MIT) On Estimation of Modal Decompositions ISIT 21-26 June 2020 2 / 21
A Brief History of Modal Decompositions Dimensionality reduction: Principal component analysis (PCA) [Pea01], [Hot33], canonical correlation analysis (CCA) [Hot36] A. Makur, G. W. Wornell, L. Zheng (MIT) On Estimation of Modal Decompositions ISIT 21-26 June 2020 3 / 21
A Brief History of Modal Decompositions Dimensionality reduction: Principal component analysis (PCA) [Pea01], [Hot33], canonical correlation analysis (CCA) [Hot36] Can we extend these techniques to categorical data? A. Makur, G. W. Wornell, L. Zheng (MIT) On Estimation of Modal Decompositions ISIT 21-26 June 2020 3 / 21
A Brief History of Modal Decompositions Dimensionality reduction: Principal component analysis (PCA) [Pea01], [Hot33], canonical correlation analysis (CCA) [Hot36] Modal decompositions: [Hir35] A. Makur, G. W. Wornell, L. Zheng (MIT) On Estimation of Modal Decompositions ISIT 21-26 June 2020 3 / 21
A Brief History of Modal Decompositions Dimensionality reduction: Principal component analysis (PCA) [Pea01], [Hot33], canonical correlation analysis (CCA) [Hot36] Modal decompositions: [Hir35] Maximal correlation: [Geb41], [R´ en59], [Wit75] A. Makur, G. W. Wornell, L. Zheng (MIT) On Estimation of Modal Decompositions ISIT 21-26 June 2020 3 / 21
A Brief History of Modal Decompositions Dimensionality reduction: Principal component analysis (PCA) [Pea01], [Hot33], canonical correlation analysis (CCA) [Hot36] Modal decompositions: [Hir35] Maximal correlation: [Geb41], [R´ en59], [Wit75] Strong data processing inequalities and related directions: χ 2 -divergence [Sar58], KL divergence [AG76], and recent work on hypercontractivity [AGKN13], contraction coefficients [MZ15], [PW17], [MZ20], functional inequalities [Rag16], estimation theory, security, and privacy [CMM + 17], . . . A. Makur, G. W. Wornell, L. Zheng (MIT) On Estimation of Modal Decompositions ISIT 21-26 June 2020 3 / 21
A Brief History of Modal Decompositions Dimensionality reduction: Principal component analysis (PCA) [Pea01], [Hot33], canonical correlation analysis (CCA) [Hot36] Modal decompositions: [Hir35] Maximal correlation: [Geb41], [R´ en59], [Wit75] Strong data processing inequalities and related directions: χ 2 -divergence [Sar58], KL divergence [AG76], and recent work on hypercontractivity [AGKN13], contraction coefficients [MZ15], [PW17], [MZ20], functional inequalities [Rag16], estimation theory, security, and privacy [CMM + 17], . . . Lancaster distributions: Mehler’s decomposition [Meh66], Lancaster decompositions [Lan58], [Lan69], orthogonal polynomials [Eag64], [Gri69], [Kou96], [Kou98], and recent work [AZ12], [MZ17], . . . A. Makur, G. W. Wornell, L. Zheng (MIT) On Estimation of Modal Decompositions ISIT 21-26 June 2020 3 / 21
A Brief History of Modal Decompositions Dimensionality reduction: Principal component analysis (PCA) [Pea01], [Hot33], canonical correlation analysis (CCA) [Hot36] Modal decompositions: [Hir35] Maximal correlation: [Geb41], [R´ en59], [Wit75] Strong data processing inequalities and related directions: χ 2 -divergence [Sar58], KL divergence [AG76], and recent work on hypercontractivity [AGKN13], contraction coefficients [MZ15], [PW17], [MZ20], functional inequalities [Rag16], estimation theory, security, and privacy [CMM + 17], . . . Lancaster distributions: Mehler’s decomposition [Meh66], Lancaster decompositions [Lan58], [Lan69], orthogonal polynomials [Eag64], [Gri69], [Kou96], [Kou98], and recent work [AZ12], [MZ17], . . . Correspondence analysis: Data visualization [Ben73], [Gre84], [GH87], and recent work on neural networks [HMWZ19], [HSC19], . . . A. Makur, G. W. Wornell, L. Zheng (MIT) On Estimation of Modal Decompositions ISIT 21-26 June 2020 3 / 21
A Brief History of Modal Decompositions Dimensionality reduction: Principal component analysis (PCA) [Pea01], [Hot33], canonical correlation analysis (CCA) [Hot36] Modal decompositions: [Hir35] Maximal correlation: [Geb41], [R´ en59], [Wit75] Strong data processing inequalities and related directions: χ 2 -divergence [Sar58], KL divergence [AG76], and recent work on hypercontractivity [AGKN13], contraction coefficients [MZ15], [PW17], [MZ20], functional inequalities [Rag16], estimation theory, security, and privacy [CMM + 17], . . . Lancaster distributions: Mehler’s decomposition [Meh66], Lancaster decompositions [Lan58], [Lan69], orthogonal polynomials [Eag64], [Gri69], [Kou96], [Kou98], and recent work [AZ12], [MZ17], . . . Correspondence analysis: Data visualization [Ben73], [Gre84], [GH87], and recent work on neural networks [HMWZ19], [HSC19], . . . Non-parametric regression: Alternating conditional expectations (ACE) algorithm [BF85], [Buj85], feature extraction [MKHZ15], [HMZW17], [HMWZ19] A. Makur, G. W. Wornell, L. Zheng (MIT) On Estimation of Modal Decompositions ISIT 21-26 June 2020 3 / 21
Formal Definitions Finite alphabets X and Y A. Makur, G. W. Wornell, L. Zheng (MIT) On Estimation of Modal Decompositions ISIT 21-26 June 2020 4 / 21
Formal Definitions Finite alphabets X and Y , and random variables X ∈ X and Y ∈ Y A. Makur, G. W. Wornell, L. Zheng (MIT) On Estimation of Modal Decompositions ISIT 21-26 June 2020 4 / 21
Formal Definitions Finite alphabets X and Y , and random variables X ∈ X and Y ∈ Y Bivariate distribution P X , Y with marginals P X , P Y > 0 A. Makur, G. W. Wornell, L. Zheng (MIT) On Estimation of Modal Decompositions ISIT 21-26 June 2020 4 / 21
Formal Definitions Finite alphabets X and Y , and random variables X ∈ X and Y ∈ Y Bivariate distribution P X , Y with marginals P X , P Y > 0 Hilbert spaces: Input space: L 2 ( X , P X ) � � � E f ( X ) 2 � � � � f : X → R < + ∞ with inner product: ∀ f 1 , f 2 ∈ L 2 ( X , P X ) , � f 1 , f 2 � P X � E [ f 1 ( X ) f 2 ( X )] = � P X ( x ) f 1 ( x ) f 2 ( x ) , x ∈ X A. Makur, G. W. Wornell, L. Zheng (MIT) On Estimation of Modal Decompositions ISIT 21-26 June 2020 4 / 21
Formal Definitions Finite alphabets X and Y , and random variables X ∈ X and Y ∈ Y Bivariate distribution P X , Y with marginals P X , P Y > 0 Hilbert spaces: Input space: L 2 ( X , P X ) � � � E f ( X ) 2 � � � � f : X → R < + ∞ with inner product: ∀ f 1 , f 2 ∈ L 2 ( X , P X ) , � f 1 , f 2 � P X � E [ f 1 ( X ) f 2 ( X )] = � P X ( x ) f 1 ( x ) f 2 ( x ) , x ∈ X and induced L 2 -norm: ∀ f ∈ L 2 ( X , P X ) , � f � 2 f ( X ) 2 � � P X = E . A. Makur, G. W. Wornell, L. Zheng (MIT) On Estimation of Modal Decompositions ISIT 21-26 June 2020 4 / 21
Formal Definitions Finite alphabets X and Y , and random variables X ∈ X and Y ∈ Y Bivariate distribution P X , Y with marginals P X , P Y > 0 Hilbert spaces: Input space: L 2 ( X , P X ) � � � E f ( X ) 2 � � � � f : X → R < + ∞ with inner product: ∀ f 1 , f 2 ∈ L 2 ( X , P X ) , � f 1 , f 2 � P X � E [ f 1 ( X ) f 2 ( X )] = � P X ( x ) f 1 ( x ) f 2 ( x ) , x ∈ X and induced L 2 -norm: ∀ f ∈ L 2 ( X , P X ) , � f � 2 f ( X ) 2 � � P X = E . Output space: L 2 ( Y , P Y ) � � E g ( Y ) 2 � � � � � g : Y → R < + ∞ A. Makur, G. W. Wornell, L. Zheng (MIT) On Estimation of Modal Decompositions ISIT 21-26 June 2020 4 / 21
Formal Definitions: Two Equivalent Representations of P X , Y Definition (Conditional Expectation Operator) P X | Y : L 2 ( X , P X ) → L 2 ( Y , P Y ) maps any f ∈ L 2 ( X , P X ) to P X | Y f ∈ L 2 ( Y , P Y ): � � ( y ) � E [ f ( X ) | Y = y ] . ∀ y ∈ Y , P X | Y f A. Makur, G. W. Wornell, L. Zheng (MIT) On Estimation of Modal Decompositions ISIT 21-26 June 2020 5 / 21
Formal Definitions: Two Equivalent Representations of P X , Y Definition (Conditional Expectation Operator) P X | Y : L 2 ( X , P X ) → L 2 ( Y , P Y ) maps any f ∈ L 2 ( X , P X ) to P X | Y f ∈ L 2 ( Y , P Y ): � � ( y ) � E [ f ( X ) | Y = y ] . ∀ y ∈ Y , P X | Y f Definition (Divergence Transition Matrix) The divergence transition matrix (DTM), denoted B ∈ R | Y |×| X | , has ( y , x )th entry given by: P X , Y ( x , y ) ∀ x ∈ X , ∀ y ∈ Y , B ( x , y ) � . � P X ( x ) P Y ( y ) A. Makur, G. W. Wornell, L. Zheng (MIT) On Estimation of Modal Decompositions ISIT 21-26 June 2020 5 / 21
Recommend
More recommend