Random matrices: Distribution of the least singular value (via - PDF document

Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1

Let ξ be a real or complex-valued random variable and M n ( ξ ) denote the random n × n matrix whose entries are i.i.d. copies of ξ : • ( R -normalization) ξ is real-valued with E ξ = 0 and E ξ 2 = 1. • ( C -normalization) ξ is complex-valued with E ξ = 0, E ℜ ( ξ ) 2 = E ℑ ( ξ ) 2 = 1 2 , and E ℜ ( ξ ) ℑ ( ξ ) = 0. In both cases ξ has mean zero and variance one. Examples. real gaussian, complex gaussian, Bernoulli ( ± 1 with probability 1 / 2). 2

Numerical Algebra. von Neumann-Goldstine (1940s): What is the condition number and the least singular value of a random matrix ? Prediction. With high probability, σ n = Θ( √ n ) , κ = Θ( n ). Smale (1980s), Demmel (1980s): Typical complexity of a numerical problem. Spielman-Teng (2000s): Smooth analysis. 3

Probability/Mathematical Physics. A basic problem in Random Matrix Theory is to understand the distributions of the eigenvalues and singular values. • Limiting distribution of the whole spectrum (such as Wigner semi-circle law). • Limiting distribution of extremal eigenvalues/singular values (such as Tracy-Widom law). 4

A special case: Gaussian models. Explicit formulae for the joint 1 distributions of the eigenvalues of √ n M n n � � λ 2 ( Real Gaussian ) c 1 ( n ) | λ i − λ j | exp( − i / 2) . (1) 1 ≤ i<j ≤ n i =1 n | λ i − λ j | 2 exp( − � � λ 2 ( Complex Gaussian ) c 2 ( n ) i / 2) . (2) 1 ≤ i<j ≤ n i =1 5

Explicit formulae for the joint distributions of the eigenvalues of n M n M ∗ 1 1 n (or the singular values of √ n M n ) n n � � � λ − 1 / 2 ( Real Gaussian ) c 3 ( n ) ( λ i − λ j ) exp( − λ i / 2) . (3) i 1 ≤ i<j ≤ n i =1 i =1 n | λ i − λ j | 2 exp( − � � ( Complex Gaussian ) c 4 ( n ) λ i / 2) . (4) 1 ≤ i<j ≤ n i =1 The limiting distributions for Gaussian matrices can be computed directly from these explicit formulae. 6

Universality Principle. The same results must hold for general normalized random variables. Informally: The limiting distributions of the spectrum should not depend too much on the distribution of the entries. Same spirit: Central limit theorem. 7

Bulk Distributions. 1 Circular Law. The limiting distribution of the eigenvalues of √ n M n is uniform in the unit circle. (Proved for complex gaussian by Mehta 1960s, real gaussian by Edelman 1980s, Girko, Bai, G¨ otze-Tykhomiro, Pan-Zhu, Tao-Vu (2000s). Full generality: Tao-Vu 2008.) Marchenko-Patur Law. The limiting distribution of the eigenvalues of � 4 � min( t, 4) n M n M ∗ 1 1 n has density x − 1 dx . (Marchenko-Pastur 1967). 2 π 0 The singular values of M n are often viewed as the (square roots) of the eigenvalues of M n M ∗ n (Wishart or sample covariance random matrices). 8

Distributions of the extremal singular values. Distribution at the soft-edge of the spectrum. Distribution of the largest singular value (or more generally the joint distribution of the k largest singular values). Johansson (2000), Johnstone (2000) Gaussian case: σ 2 n − 4 2 4 / 3 n − 2 / 3 → TW. Soshnikov (2008): The result holds for all ξ with exponential tail. 9

Wigner’s trace method. For all even k σ 1 ( M ) k + . . . + σ n ( M ) k = Trace ( MM ∗ ) k/ 2 . Notice that if k is large, the left hand side is dominated by the largest term σ 1 ( M ) k . Thus, if one can estimate E Trace M k for very large k , one could, in principle, get a good control on σ 1 ( M ). Trace ( MM ∗ ) l := � m i 1 i 2 m ∗ i 2 i 3 . . . m i l − 1 i l m ∗ i l i 1 . i 1 ,...,i l E m i 1 i 2 m ∗ i 2 i 3 . . . m i l − 1 i l m ∗ i l i 1 = 0 unless i 1 . . . i l i 1 forms a special closed walk in K n , thanks to the independence of the entries. (F¨ uredi-Koml´ os, Soshnikov, V., Soshnikov-Peche etc). 10

Distribution at the hard-edge of the spectrum. Distribution of the least singular value (or more generally the joint distribution of the k smallest singular values). Edelman (1988) Gaussian case: Real Gaussian √ P ( nσ n ( M n ( gR )) 2 ≤ t ) = 1 − e − t/ 2 − t + o (1) . Complex Gaussian P ( nσ n ( M n ( gC )) 2 ≤ t ) = 1 − e − t . Forrester (1994) Joint distribution of the least k singular values. Ben Arous-Peche (2007) Gaussian divisible random variables. 11

What about general entries ? The proofs for Gaussian cases relied on special properties of the Gaussian distribution and cannot be extended. One can view σ n ( M ) as the largest singular value of M − 1 . However, the trace method does apply as the entries of M − 1 are not independent. 12

Property testing Given a large, complex, structure S , we would like to study some parameter P of S . It has been observed that quite often one can obtain some good estimates about P by just looking at the small substructure of S , sampled randomly. In our case, the large structure is our matrix S := M − 1 n , and the parameter in question is its largest singular value. It has turned out that this largest singular value can be estimated quite precisely (and with high probability) by sampling a few rows (say s ) from S and considering the submatrix S ′ formed by these rows. 13

Sampling. Assume, for simplicity, that | ξ | is bounded and M n is invertible with probability one. P ( nσ n ( M n ( ξ )) 2 ≤ t ) = P ( σ 1 ( M n ( ξ ) − 1 ) 2 ≥ n/t ) . Let R 1 ( ξ ) , . . . , R n ( ξ ) denote the rows of M n ( ξ ) − 1 . Lemma [Random sampling] Let 1 ≤ s ≤ n be integers. A be an n × n real or complex matrix with rows R 1 , . . . , R n . Let k 1 , . . . , k s ∈ { 1 , . . . , n } be selected independently and uniformly at random, and let B be the s × n matrix with rows R k 1 , . . . , R k s . Then n E � A ∗ A − n F ≤ n � s B ∗ B � 2 | R k | 4 . s k =1 (special case of Frieze-Kannan-Vempala.) 14

R i = ( a i 1 , . . . , a in ). For 1 ≤ i ≤ j , the ij entry of A ∗ A − n s B ∗ B is given by n s a ki a kj − n � � a k l i a k l j . (5) s k =1 l =1 For l = 1 , . . . , s , the random variables a k l i a k l j are iid with mean � n 1 k =1 a ki a kj and variance n n n V ij := 1 | a ki | 2 | a kj | 2 − | 1 � � a ki a kj | 2 , (6) n n k =1 k =1 and so the random variable (5) has mean zero and variance n 2 s V ij . 15

Summing over i, j , we conclude that n n F = n 2 E � A ∗ A − n � � s B ∗ B � 2 V ij . s i =1 j =1 Discarding the second term in V ij , we conclude n n n E � A ∗ A − n F ≤ n � � � s B ∗ B � 2 | a ki | 2 | a kj | 2 . s i =1 j =1 k =1 Performing the i, j summations, we obtain the claim. 16

Bounding the error term The expectation E | R i ( ξ ) | is infinity. However, we have the following tail bound Lemma. [Tail bound on | R i ( ξ ) | ] Let R 1 , . . . , R n be the rows of M n ( ξ ) − 1 . Then 1 ≤ i ≤ n | R i ( ξ ) | ≥ n 100 /C 0 ) ≪ n − 1 /C 0 . P ( max 17

Inverting and Projecting One dimensional case. Let A be an invertible matrix with columns X 1 , . . . , X n . Let R i be the rows of A − 1 . Fact. R 1 is the reciprocal of the projection of X 1 onto the normal direction of the hyperplane spanned by X 2 , . . . , X n . Proof. Consider the identity A − 1 A = I . So R 1 is orthogonal with X 2 , . . . , X n and R 1 · X 1 = 1. 18

Inverting and Projecting, continue High dimensional case. Lemma. [Projection lemma] Let V be the s -dimensional subspace formed as the orthogonal complement of the span of X s +1 , . . . , X n , which we identify with F s ( F is either real or complex) via an orthonormal basis, and let π : F n → F s be the orthogonal projection to V ≡ F s . Let M be the s × s matrix with columns π ( X 1 ) , . . . , π ( X s ). Then M is invertible, and we have BB ∗ = M − 1 ( M − 1 ) ∗ . In particular, we have σ j ( B ) = σ s − j +1 ( M ) − 1 for all 1 ≤ j ≤ s . 19

Most importantly, this means the largest singular value of B is the smallest singular value of M . Together with the Sampling lemma and the Tail bound lemma, this reduces the study of the smallest singular value of an n × n matrix to that of an s × s matrix. The key point of the argument is that the orthogonal projection onto a small dimensional subspace has an averaging effect that makes the image close to gaussian. Similarity Dvoretzky theorem: A low dimensional random cross section of the n -dimensional unit cube looks like a ball with high probability. 20

One dimensional Berry-Esseen central limit theorem. Let v 1 , . . . , v n ∈ R be real numbers with v 2 1 + . . . + v 2 n = 1 and let ξ be a R -normalized random variable with finite third moment E | ξ | 3 < ∞ . Let S ∈ R denote the random variable S = v 1 ξ 1 + . . . + v n ξ n where ξ 1 , . . . , ξ n are iid copies of ξ . Then for any t ∈ R we have n � | v j | 3 ) , P ( S ≤ t ) = P ( gR ≤ t ) + O ( j =1 where the implied constant depends on the third moment E | ξ | 3 of ξ . In particular, we have P ( S ≤ t ) = P ( gR ≤ t ) + O ( max 1 ≤ j ≤ n | v j | ) . Morality. Sum of real iid random variables with non-degereated coefficients is asymptotically gaussian. 21

Random matrices: Distribution of the least singular value (via - PDF document

Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let be a real or complex-valued random variable and M n (

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

[11] The Singular Value Decomposition The Singular Value Decomposition Gene Golubs license

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

Singular Value Decomposition Presented by Matthew Motoki 1 What is a singular value

Random Vectors, Random Matrices, and Matrix Expected Value James H. Steiger Department of

The Singular Value Decomposition COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Eigenvalue Problems and Singular Value Decomposition Sanzheng Qiao Department of Computing and

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Smallest singular value and limit eigenvalue distribution of a class of non-Hermitian random

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

SYMBOLIC LOGIC UNIT 10: SINGULAR SENTENCES Singular Sentences (monadic) Paris is beautiful

/ Link Invariants from Braided Monoidal On the PROB of Singular Braids Categories Singular

Descriptive and combinatorial set theory Introduction Singular cardinals, at singular cardinals

Using Geometric Singular Perturbation Theory to Understand Singular Shocks Barbara Lee Keyfitz

Non asymptotic study of the singular values of some random covariance matrices. Olivier Gu

Lists A list is a popular data structure to store data in sequential order. For example, a list

What to Learn From MicroBooNE DAQ? Wesley Ketchum with input of lots of MicroBooNE people 30

Indexed Copatterns Reasoning about Infinite Structures by Observations David Thibodeau Brigitte

RCGC can be more attractive Reference counting example Root set Heap space 1 1 2 1 1 1 1 2

CS 188: Artificial Intelligence Hidden Markov Models Instructors: Pieter Abbeel and Dan Klein ---

On the maximum of the characteristic polynomial of the Circular Beta Ensemble Joseph Najnudel

Browns Spectral Measure, and the Free Multiplicative Brownian Motion West Coast Operator

Products of Non-Hermitian Random Matrices David Renfrew Department of Mathematics University of

Random matrices: Distribution of the least singular value (via - PDF document

Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let be a real or complex-valued random variable and M n (

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

[11] The Singular Value Decomposition The Singular Value Decomposition Gene Golubs license

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

Singular Value Decomposition Presented by Matthew Motoki 1 What is a singular value

Random Vectors, Random Matrices, and Matrix Expected Value James H. Steiger Department of

The Singular Value Decomposition COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Eigenvalue Problems and Singular Value Decomposition Sanzheng Qiao Department of Computing and

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Smallest singular value and limit eigenvalue distribution of a class of non-Hermitian random

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices &amp; quadratic forms)

SYMBOLIC LOGIC UNIT 10: SINGULAR SENTENCES Singular Sentences (monadic) Paris is beautiful

/ Link Invariants from Braided Monoidal On the PROB of Singular Braids Categories Singular

Descriptive and combinatorial set theory Introduction Singular cardinals, at singular cardinals

Using Geometric Singular Perturbation Theory to Understand Singular Shocks Barbara Lee Keyfitz

Non asymptotic study of the singular values of some random covariance matrices. Olivier Gu

Lists A list is a popular data structure to store data in sequential order. For example, a list

What to Learn From MicroBooNE DAQ? Wesley Ketchum with input of lots of MicroBooNE people 30

Indexed Copatterns Reasoning about Infinite Structures by Observations David Thibodeau Brigitte

RCGC can be more attractive Reference counting example Root set Heap space 1 1 2 1 1 1 1 2

CS 188: Artificial Intelligence Hidden Markov Models Instructors: Pieter Abbeel and Dan Klein ---

On the maximum of the characteristic polynomial of the Circular Beta Ensemble Joseph Najnudel

Browns Spectral Measure, and the Free Multiplicative Brownian Motion West Coast Operator

Products of Non-Hermitian Random Matrices David Renfrew Department of Mathematics University of

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)