Numerical analysis and random matrix theory Tom Trogdon ttrogdon@math.uci.edu UC Irvine
Acknowledgements This is joint work with: • Percy Deift • Govind Menon • Sheehan Olver • Raj Rao
Numerical analysis and random matrix theory • Using techniques from numerical analysis to analyze random matrices Trotter (1984), Silverstein (1985), Edelman (1989), Dumitriu and Edelman (2002) • Computing distributions from random matrix theory Bornemann (2008), Witte, Bornemann and Forrester (2012), Olver and T (2014) • Generating samples from random matrix distributions Mezzadri (2006), Edelman and Rao (2005), Menon and Li (2014), Olver, Rao and T (2015) • Using random matrices to analyze algorithms, statistically Spielman and Teng (2009), Borgwardt (2012), Smale (1983, 1985), Deift and T (2016, 2017), Menon and T (2016), Feldheim, Paquette, and Zeitouni (2014) • Randomized linear algebra Liberty, Woolfe, Martinsson, Rokhlin, and Tygert (2007)
Statistical analysis of algorithms • Smoothed analysis: Spielman and Teng (2009) • Simplex algorithm: Borgwardt (2012), Smale (1985) • Universality: Pfrang, Deift and Menon (2014)
Universality in numerical computation w/ Percy Deift, Govind Menon and others
Universality: A toy example Consider the numerical solution of the system H = H ⇤ , ( I � H ) x = b , k H k 2 < 1. The ensemble. Assume H = U diag ( λ 1 , λ 2 ,..., λ N ) U ∗ where ( λ j ) 1 ≤ j ≤ N are iid with Z P ( λ j ∈ B ) = p ( x ) d x . B ∩ [ − 1,1 ] The method. A naïve iterative scheme is given by x n = H x n − 1 + b , x 0 = 0 .
Universality: A toy example The halting time. To keep things simple, define T = T ( H , ε ) to be the smallest integer T such that � T − 1 � � � � ( I − H ) − 1 − X H k < ε . � � � � � k = 0 2 The random variable T is the halting time. The question. What is the distribution of T ? Is it universal as N → ∞ ?
Define λ min = min λ j , λ max = max λ j . Then for fixed t ≥ 0 as N → ∞ 1 − t = p ( 1 ) t ⇣ ⌘ N + o ( N − 1 ) , N ≤ λ j ≤ 1 P − 1 ≤ λ j ≤ − 1 + t = p ( − 1 ) t ⇣ ⌘ N + o ( N − 1 ) . P N Assume p ( 1 ) 6 = 0. Then as N ! 1 dist. p ( 1 ) N ( 1 � λ max ) �! exp ( 1 ) , dist. N ( 1 + λ min ) �! exp ( p ( � 1 )) . Because H is Hermitian ⇢ | λ max | n � � n − 1 | 1 − λ max | , | λ min | n � � � X H k � ( I − H ) − 1 − = max . � � � � | 1 − λ min | � k = 0 2
For ε > 0, define t = t ( H , ε ) by ⇢ | λ max | t | λ min | t � max | 1 − λ max | , = ε . | 1 − λ min | Then T ( H , ε ) = max { 0, d t ( H , ε ) e } and ⇢ log ε � 1 | 1 � λ max | � 1 , log ε � 1 | 1 � λ min | � 1 � t ( H , ε ) = max . log | λ max | � 1 log | λ min | � 1 Define t max = log ε − 1 | 1 − λ max | − 1 t min = log ε − 1 | 1 − λ min | − 1 , . log | λ max | − 1 log | λ min | − 1
“Universality” Two estimates are needed as N → ∞ : −→ 1 t max dist. ξ ∼ exp ( 1 ) , ξ , r ( 1 ) N log [ N ε − 1 ] t min = O ( N ) . Because t min is asymptotically smaller than t max it does not affect the limit: T ( H , ε ) −→ 1 dist. ξ ∼ exp ( 1 ) . ξ , r ( 1 ) N log [ N ε − 1 ]
This establishes universality for T ( H , ε ) in this simple case. It required three main components: • A formula to estimate error. • Universality for key statistics ( λ max ). • Estimates for the rest ( λ min ).
Significant work goes into determining the scaling. If the scaling is not known, but universality is going to be investigated, a normalization must be inferred from the data. Given a collection of samples of the random variable T = T ( H , ε ) , define the empirical fluctuations T ( H , ε ) − 〈 T 〉 τ ( H , ε ) = , σ T where 〈 T 〉 is the sample median and σ T is the median absolute deviation. 0.6 0.5 Two choices for p ( x ) : 0.4 • p ( x ) = 1 0.3 • p ( x ) = e − x / ( e − e − 1 ) 0.2 0.1 0.0 - 1.5 - 1.0 - 0.5 0.0 0.5 1.0 1.5 2.0
The ( inverse ) power method The ensembles. A sample covariance matrix (SCM) is an N × N real symmetric ( β = 1) or complex Hermitian ( β = 2) matrix H = X ∗ X / M , X = ( X i j ) 1 ≤ i ≤ M ,1 ≤ j ≤ N iid, M ∼ N / d , 0 < d < 1, E | X i j | 2 = 1. E X i j = 0, For β = 2, E X 2 i j = 0 is imposed. We require uniform exponential tails. The method. The power method is given by ( x 0 specified) 0 H 2 k − 1 x 0 µ k = x ∗ → λ N = λ max , k → ∞ . 0 H 2 k − 2 x 0 x ∗ The halting time. The halting time is given by T ( H , x 0 , ε ) = min { n : | µ n − µ n − 1 | < ε 2 } . The question. What is the distribution of T ? Is it universal as N → ∞ ?
Observing universality Using 4 complex SCMs, the fluctuations τ ( H , ε ) = T ( H , ε ) − 〈 T 〉 , σ T appear universal. 1.2 1.0 0.8 0.6 0.4 0.2 - 1.0 - 0.5 0.0 0.5 1.0 1.5
A formula to estimate the error Let H = U Λ U ∗ and U ∗ x 0 = [ β 1 , β 2 ,..., β N ] T , Λ = diag ( λ 1 , λ 2 ,..., λ N ) , 0 ≤ λ 1 ≤ λ 2 ≤ ··· ≤ λ N . To estmate the halting time for the power method, we must find the large N and large k asymptotics for 2 Ç λ j å 2 k − 1 0 � � 1 N N − 1 β j � � X X | β j | 2 λ 2 k − 1 1 + � � j � � β N λ N B C � � j = 1 j = 1 B C = λ N µ k = . B C 2 Ç λ j B C N å 2 k − 2 � � N − 1 β j B C X | β j | 2 λ 2 k − 2 � � X 1 + @ A � � j � � β N λ N j = 1 � � j = 1
A historical interlude For eigenvalues: • The seminal work of Geman (1980) showed that the largest eigenvalue of an SCM converges a.s. • Silverstein (1985) established that the smallest eigenvalue converges a.s. to λ − for iid standard normal random variables. • Johnstone (2001); Johansson(2000); Forrester (1993) gave the first results on the fluctuations of the largest and smallest eigenvalues for (real or complex) standard normal distributions. • Universality was obtained by Ben Arous and Péché (2005) (see also Tao and Vu (2012)). • We reference Pillai and Yin (2014) and Bloemendal, Knowles, Yau and Yin (2016) for the most comprehensive results. For eigenvectors: • The limits of the eigenvectors have also been considered in various ways, see Silverstein (1986); Bai, Miao and Pan (2007). • We reference Bloemendal, Knowles, Yau and Yin (2016) for the generality needed to prove our theorems. We require exponential tails which is stronger than the assumptions in Yin (1986); Geman (1980).
Universality for key statistics For SCMs Theorem. N 1 / 2 ( | β N | , | β N − 1 | , | β N − 2 | ) converge jointly in distribution to ( | X 1 | , | X 2 | , | X 3 | ) where { X 1 , X 2 , X 3 } are iid real ( β = 1) or complex ( β = 2) standard normal random variables. Additionally, N 2 / 3 λ − 2 / 3 d 1 / 2 ( λ + − λ N , λ + − λ N − 1 , λ + − λ N − 2 ) + converge jointly in distribution to random variables ( Λ 1, β , Λ 2, β , Λ 3, β ) which are the smallest three eigenvalues of the so-called stochastic Airy operator. This follows from Ramírez, Rider and Virág (2011); Pillai and Yin (2014); Bloemendal, Knowles, Yau and Yin (2016).
Estimates for the rest For any s > 0 Theorem (Pillai and Yin (2014)). | λ n − γ n | ≤ N − 2 / 3 + s ( max { n , N − n + 1 } ) − 1 / 3 for all n Ä ä lim = 1. N →∞ P For any s > 0 Theorem (Bloemendal, Knowles, Yau and Yin (2016)). | β n | ≤ N − 1 / 2 + s for all n Ä ä lim = 1. N →∞ P The first theorem is known as rigidity and it was first shown for Wigner ensembles by Erd˝ os, Yau and Yin (2012) (see also Bourgade, Erdös and Yau (2014)). The second theorem is known as delocalization.
Universality for the ( inverse ) power method The distribution function F gap β ( t ) , supported on t ≥ 0 for β = 1,2 is defined by ! 1 F gap β ( t ) : = lim ≤ t . N →∞ P 2 − 7 / 6 N 2 / 3 λ − 2 / 3 d − 1 / 2 ( λ N − λ N − 1 ) + Let H be a real ( β = 1) or complex ( β = 2) SCM and let Theorem (Deift and T). x 0 be a random unit vector independent of H . Assuming ε ≤ N − 5 / 3 − σ ! T ( H , x 0 , ε ) = F gap lim ≤ t β ( t ) . N →∞ P 2 − 7 / 6 λ 1 / 3 + d − 1 / 2 N 2 / 3 ( log ε − 1 − 2 / 3log N )
A note on the proof In the the proof we give the following approximation: � � � � ⇣ ⌘ log ✏ − 1 + log � β N − 1 λ N + 1 1 − 2 log 2 � N + log � � � � T ( H, x 0 , ✏ ) λ N − 1 β N prob . � � � → 0 − − � � N 2 / 3 N 2 / 3 log λ N � � λ N − 1 � � This we obtain: log ✏ − 1 − 2 3 log N T ( H, x 0 , ✏ ) prob . → 0 N 2 / 3 log N − − N 2 / 3 log N � − 1 + | � N − � N − 1 | Convergence is at an ( at best ) logarithmic rate.
A demonstration β � 2 β � 1 1.0 1.0 0.8 0.8 Frequency Frequency 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0 1 2 3 4 0 1 2 3 4 Rescaled halting time Rescaled halting time Thanks to Forkmar Bornemann (TUM) ! T ( H , x 0 , ε ) = F gap lim ≤ t β ( t ) . N →∞ P 2 − 7 / 6 λ 1 / 3 + d − 1 / 2 N 2 / 3 ( log ε − 1 − 2 / 3log N + γ )
Other theorems Similar techniques give universality theorems for: • The QR eigenvalue algorithm. • The Toda algorithm. See Deift and T (2016) and Deift and T (2017).
Recommend
More recommend