A CLT for Wishart Tensors Dan Mikulincer Weizmann Institute of Science 1
Wishart Tensors Let { X i } d i =1 be i.i.d. copies of an isotropic random vector X ∼ µ in R n . Denote by W p n , d ( µ ) the law of � � �� d � 1 X ⊗ p X ⊗ p √ − E . i i d i =1 We are interested in the behavior as d → ∞ . Specifically, when is it true that W p n , d ( µ ) is approximately Gaussian? 2
Wishart Tensors Let { X i } d i =1 be i.i.d. copies of an isotropic random vector X ∼ µ in R n . Denote by W p n , d ( µ ) the law of � � �� d � 1 X ⊗ p X ⊗ p √ − E . i i d i =1 We are interested in the behavior as d → ∞ . Specifically, when is it true that W p n , d ( µ ) is approximately Gaussian? 2
Technicalities W p n , d ( µ ) is a measure on the tensor space ( R n ) ⊗ p , which we identify with R n · p , through the basis, { e i 1 ⊗ · · · ⊗ e i p | 1 ≤ i 1 , . . . , i p ≤ n } . For simplicity we will focus on the sub-space of ’principal’ tensors, with basis, { e i 1 ⊗ · · · ⊗ e i p | 1 ≤ i 1 < · · · < i p ≤ n } . The projection of W p n , d ( µ ) will be denoted by � W p n , d ( µ ). 3
Technicalities W p n , d ( µ ) is a measure on the tensor space ( R n ) ⊗ p , which we identify with R n · p , through the basis, { e i 1 ⊗ · · · ⊗ e i p | 1 ≤ i 1 , . . . , i p ≤ n } . For simplicity we will focus on the sub-space of ’principal’ tensors, with basis, { e i 1 ⊗ · · · ⊗ e i p | 1 ≤ i 1 < · · · < i p ≤ n } . The projection of W p n , d ( µ ) will be denoted by � W p n , d ( µ ). 3
Technicalities W p n , d ( µ ) is a measure on the tensor space ( R n ) ⊗ p , which we identify with R n · p , through the basis, { e i 1 ⊗ · · · ⊗ e i p | 1 ≤ i 1 , . . . , i p ≤ n } . For simplicity we will focus on the sub-space of ’principal’ tensors, with basis, { e i 1 ⊗ · · · ⊗ e i p | 1 ≤ i 1 < · · · < i p ≤ n } . The projection of W p n , d ( µ ) will be denoted by � W p n , d ( µ ). 3
Wishart Matrices When p = 2 and X ∼ µ is isotropic, W 2 n , d ( µ ) can be realized as the law of XX T − d · Id √ . d Here, X is an n × d matrix, with columns being i.i.d. copies of X . In this case, � W 2 n , d ( µ ) is the law of the upper triangular part. 4
Some Observations Let us restrict our attention to the case p = 2. ❼ for fixed n , by the central limit theorem W 2 n , d ( µ ) → N (0 , Σ). ❼ If n = d , then the spectral measure of XX T converges to the Marchenko-Pastur distribution. In particular, W 2 n , d ( µ ) is not Gaussian. Question How should n depend on d so that W p n , d ( µ ) is approximately Gaussian. 5
Some Observations Let us restrict our attention to the case p = 2. ❼ for fixed n , by the central limit theorem W 2 n , d ( µ ) → N (0 , Σ). ❼ If n = d , then the spectral measure of XX T converges to the Marchenko-Pastur distribution. In particular, W 2 n , d ( µ ) is not Gaussian. Question How should n depend on d so that W p n , d ( µ ) is approximately Gaussian. 5
Some Observations Let us restrict our attention to the case p = 2. ❼ for fixed n , by the central limit theorem W 2 n , d ( µ ) → N (0 , Σ). ❼ If n = d , then the spectral measure of XX T converges to the Marchenko-Pastur distribution. In particular, W 2 n , d ( µ ) is not Gaussian. Question How should n depend on d so that W p n , d ( µ ) is approximately Gaussian. 5
Some Observations Let us restrict our attention to the case p = 2. ❼ for fixed n , by the central limit theorem W 2 n , d ( µ ) → N (0 , Σ). ❼ If n = d , then the spectral measure of XX T converges to the Marchenko-Pastur distribution. In particular, W 2 n , d ( µ ) is not Gaussian. Question How should n depend on d so that W p n , d ( µ ) is approximately Gaussian. 5
Random Geometric Graphs From now on, let γ stand for the standard Gaussian, in different dimensions. In (Bubeck, Ding, Eldan, R´ acz 15’) and independently in (Jiang, Li 15’) it was shown, � � ❼ If n 3 � W 2 d → 0, then TV → 0. n , d ( γ ) , γ This is tight, in the sense, � � ❼ If n 3 � W 2 d → ∞ , then TV → 1. n , d ( γ ) , γ (R´ acz, Richey 16’) shows that the phase transition is smooth. 6
Extensions (Bubeck, Ganguly 15’) extended the result to any log-concave product measure. That is, X i , j are i.i.d. as e − ϕ ( x ) dx for some convex ϕ . ❼ Original motivation came from random geomteric graphs. ❼ (Fang, Koike 20’) removed the log-concavity assumption. 7
Extensions (Nourdin, Zheng 18’) gave the following results, as an answer to questions raised in (Bubeck, Ganguly 15’) ❼ If the rows of X are i.i.d. N (0 , Σ), for some positive definite Σ. Then � � � n 3 W 2 � W 1 n , d , γ � d . (See also (Eldan, M 16’)) � � � W p � n 2 p − 1 ❼ W 1 n , d ( γ ) , γ � . d 8
Extensions (Nourdin, Zheng 18’) gave the following results, as an answer to questions raised in (Bubeck, Ganguly 15’) ❼ If the rows of X are i.i.d. N (0 , Σ), for some positive definite Σ. Then � � � n 3 W 2 � W 1 n , d , γ � d . (See also (Eldan, M 16’)) � � � W p � n 2 p − 1 ❼ W 1 n , d ( γ ) , γ � . d 8
Extensions (Nourdin, Zheng 18’) gave the following results, as an answer to questions raised in (Bubeck, Ganguly 15’) ❼ If the rows of X are i.i.d. N (0 , Σ), for some positive definite Σ. Then � � � n 3 W 2 � W 1 n , d , γ � d . (See also (Eldan, M 16’)) � � � W p � n 2 p − 1 ❼ W 1 n , d ( γ ) , γ � . d 8
Main Result Today: Theorem If µ is a measure on R n which is uniformly log-concave and unconditional, then � � � n 2 p − 1 W p � n , d ( µ ) , γ � . dist d ❼ dist stands from some notion of distance to be introduced soon. But could be replaced with W 2 . ❼ The assumptions of uniform log-concavity and unconditionality may be relaxed. ❼ The result also holds for a large class of product measures. 9
Main Result Today: Theorem If µ is a measure on R n which is uniformly log-concave and unconditional, then � � � n 2 p − 1 W p � n , d ( µ ) , γ � . dist d ❼ dist stands from some notion of distance to be introduced soon. But could be replaced with W 2 . ❼ The assumptions of uniform log-concavity and unconditionality may be relaxed. ❼ The result also holds for a large class of product measures. 9
Main Result Today: Theorem If µ is a measure on R n which is uniformly log-concave and unconditional, then � � � n 2 p − 1 W p � n , d ( µ ) , γ � . dist d ❼ dist stands from some notion of distance to be introduced soon. But could be replaced with W 2 . ❼ The assumptions of uniform log-concavity and unconditionality may be relaxed. ❼ The result also holds for a large class of product measures. 9
Main Result Today: Theorem If µ is a measure on R n which is uniformly log-concave and unconditional, then � � � n 2 p − 1 W p � n , d ( µ ) , γ � . dist d ❼ dist stands from some notion of distance to be introduced soon. But could be replaced with W 2 . ❼ The assumptions of uniform log-concavity and unconditionality may be relaxed. ❼ The result also holds for a large class of product measures. 9
The Challenge � � �� � d 1 X ⊗ p X ⊗ p By considering, − E , one may hope to be √ i i d i =1 able to apply an estimate of the high-dimensional central limit theorem. Optimistically, such estimates give: � � X ⊗ p � 3 � � � ≤ E W p � √ dist n , d ( µ ) , γ . d Thus, to obtain optimal convergence rates, we need to exploit the low dimensional structure of � W p n , d ( µ ). 10
The Challenge � � �� � d 1 X ⊗ p X ⊗ p By considering, − E , one may hope to be √ i i d i =1 able to apply an estimate of the high-dimensional central limit theorem. Optimistically, such estimates give: � � X ⊗ p � 3 � � � ≤ E W p � √ dist n , d ( µ ) , γ . d Thus, to obtain optimal convergence rates, we need to exploit the low dimensional structure of � W p n , d ( µ ). 10
The Challenge � � �� � d 1 X ⊗ p X ⊗ p By considering, − E , one may hope to be √ i i d i =1 able to apply an estimate of the high-dimensional central limit theorem. Optimistically, such estimates give: � � X ⊗ p � 3 � � � ≤ E W p � √ dist n , d ( µ ) , γ . d Thus, to obtain optimal convergence rates, we need to exploit the low dimensional structure of � W p n , d ( µ ). 10
Stein’s Method Basic observation: If G ∼ γ on R n . Then, for any smooth test function f : R n → R n , E [ � G , f ( G ) � ] = E [ div f ( G )] . Moreover, the Gaussian is the only measure which satisfies this relation. Stein’s idea: E [ � X , f ( X ) � ] ≃ E [ div f ( X )] = ⇒ X ≃ G . 11
Stein’s Method Basic observation: If G ∼ γ on R n . Then, for any smooth test function f : R n → R n , E [ � G , f ( G ) � ] = E [ div f ( G )] . Moreover, the Gaussian is the only measure which satisfies this relation. Stein’s idea: E [ � X , f ( X ) � ] ≃ E [ div f ( X )] = ⇒ X ≃ G . 11
Recommend
More recommend