Cleaning correlation matrices, Random Matrix Theory & HCIZ - PowerPoint PPT Presentation

Cleaning correlation matrices, Random Matrix Theory & HCIZ integrals J.P Bouchaud with: M. Potters, L. Laloux, R. Allez, J. Bun, S. Majumdar http://www.cfm.fr

Portfolio theory: Basics • Portfolio weights w i , Asset returns X t i • If expected/predicted gains are g i then the expected gain of the portfolio is � G = w i g i i • Let risk be defined as: variance of the portfolio returns (maybe not a good definition !) � R 2 = w i σ i C ij σ j w j ij where σ 2 i is the variance of asset i , and C ij is the correlation matrix. J.-P. Bouchaud

Markowitz Optimization • Find the portfolio with maximum expected return for a given risk or equivalently, minimum risk for a given return ( G ) • In matrix notation: w C = G C − 1 g g T C − 1 g where all gains are measured with respect to the risk-free rate and σ i = 1 (absorbed in g i ). • Note: in the presence of non-linear contraints, e.g. � | w i | ≤ A i a “spin-glass” problem! (see [JPB,Galluccio,Potters]) J.-P. Bouchaud

Markowitz Optimization • More explicitly: � � λ − 1 ( λ − 1 w ∝ ( Ψ α · g ) Ψ α = g + − 1) ( Ψ α · g ) Ψ α α α α α • Compared to the naive allocation w ∝ g : – Eigenvectors with λ ≫ 1 are projected out – Eigenvectors with λ ≪ 1 are overallocated • Very important for “stat. arb.” strategies (for example) J.-P. Bouchaud

Empirical Correlation Matrix • Before inverting them, how should one estimate/clean correlation matrices? • Empirical Equal-Time Correlation Matrix E X t i X t E ij = 1 � j T σ i σ j t Order N 2 quantities estimated with NT datapoints. When T < N , E is not even invertible. Typically: N = 500 − 2000; T = 500 − 2500 days (10 years – Beware of high frequencies) − → q := N/T = O (1) J.-P. Bouchaud

Risk of Optimized Portfolios • “In-sample” risk (for G = 1): 1 R 2 in = w T E Ew E = g T E − 1 g • True minimal risk 1 R 2 true = w T C Cw C = g T C − 1 g • “Out-of-sample” risk E Cw E = g T E − 1 CE − 1 g R 2 out = w T ( g T E − 1 g ) 2 J.-P. Bouchaud

Risk of Optimized Portfolios • Let E be a noisy, unbiased estimator of C . Using convexity arguments, and for large matrices: R 2 in ≤ R 2 true ≤ R 2 out true (1 − q ) − 1 = R 2 • In fact, using RMT: R 2 out = R 2 in (1 − q ) − 2 , indep. of C ! (For large N ) • If C has some time dependence (beyond observation noise) one expects an even worse underestimation J.-P. Bouchaud

In Sample vs. Out of Sample 150 100 Return 50 Raw in-sample Cleaned in-sample Cleaned out-of-sample Raw out-of-sample 0 0 10 20 30 Risk J.-P. Bouchaud

Rotational invariance hypothesis (RIH) • In the absence of any cogent prior on the eigenvectors, one can assume that C is a member of a Rotationally Invariant Ensemble – “RIH” √ • Surely not true for the “market mode” � v 1 ≈ (1 , 1 , . . . , 1) / N , with λ 1 ≈ Nρ but OK in the bulk (see below) A more plausible assumption: factor model → hierarchical, block diagonal C ’s (“Parisi matrices”) • “Cleaning” E within RIH: keep the eigenvectors, play with eigenvalues → The simplest, classical scheme, shrinkage: C = (1 − α ) E + α I → � λ C = (1 − α ) λ E + α, α ∈ [0 , 1] J.-P. Bouchaud

RMT: from ρ C ( λ ) to ρ E ( λ ) • Solution using different techniques (replicas, diagrams, free matrices) gives the resolvent G E ( z ) = N − 1 Tr( E − z I ) as: � 1 G E ( z ) = dλ ρ C ( λ ) z − λ (1 − q + qzG E ( z )) , Note: One should work from ρ C − → G E • Example 1: C = I (null hypothesis) → Marcenko-Pastur [67] � ( λ + − λ )( λ − λ − ) λ ∈ [(1 − √ q ) 2 , (1 + √ q ) 2 ] ρ E ( λ ) = , 2 πqλ • Suggests a second cleaning scheme (Eigenvalue clipping, [Laloux et al. 1997]): any eigenvalue beyond the Marcenko-Pastur edge can be trusted, the rest is noise. J.-P. Bouchaud

Eigenvalue clipping λ < λ + are replaced by a unique one, so as to preserve Tr C = N . J.-P. Bouchaud

RMT: from ρ C ( λ ) to ρ E ( λ ) • Solution using different techniques (replicas, diagrams, free matrices) gives the resolvent G E ( z ) as: � 1 G E ( z ) = dλ ρ C ( λ ) z − λ (1 − q + qzG E ( z )) , Note: One should work from ρ C − → G E • Example 2: Power-law spectrum (motivated by data) µA ρ C ( λ ) = ( λ − λ 0 ) 1+ µ Θ( λ − λ min ) • Suggests a third cleaning scheme (Eigenvalue substitution, Potters et al. 2009, El Karoui 2010): λ E is replaced by the theoretical λ C with the same rank k J.-P. Bouchaud

Empirical Correlation Matrix 1.5 8 Data Dressed power law ( µ=2) 6 Raw power law ( µ=2) 4 Marcenko-Pastur κ 2 0 1 -2 0 250 500 ρ(λ) rank 0.5 0 0 1 2 3 4 5 λ MP and generalized MP fits of the spectrum J.-P. Bouchaud

Eigenvalue cleaning 3.5 Classical Shrinkage 3 Ledoit-Wolf Shrinkage Power Law Substitution Eigenvalue Clipping 2.5 2 2 R 1.5 1 0.5 0 0 0.2 0.4 0.6 0.8 1 α Out-of sample risk for different 1-parameter cleaning schemes J.-P. Bouchaud

A RIH Bayesian approach • All the above schemes lack a rigorous framework and are at best ad-hoc recipes • A Bayesian framework: suppose C belongs to a RIE, with P ( C ) and assume Gaussian returns. Then one needs: � D CC P ( C |{ X t � C �| X t i = i } ) with � � i } ) = Z − 1 exp P ( C |{ X t − N Tr V ( C , { X t i } ) ; where (Bayes): � log C + EC − 1 � i } ) = 1 V ( C , { X t + V 0 ( C ) 2 q J.-P. Bouchaud

A Bayesian approach: a fully soluble case • V 0 ( C ) = (1 + b ) ln C + b C − 1 , b > 0: “Inverse Wishart” � � ( λ + − λ )( λ − λ − ) (1 + b ) 2 − b 2 / 4) /b • ρ C ( λ ) ∝ ; λ ± = (1 + b ± λ 2 • In this case, the matrix integral can be done, leading exactly to the “Shrinkage” recipe, with α = f ( b, q ) • Note that b can be determined from the empirical spectrum of E , using the generalized MP formula J.-P. Bouchaud

The general case: HCIZ integrals • A Coulomb gas approach: integrate over the orthogonal group C = O Λ O † , where Λ is diagonal. � �� − N log Λ + EO † Λ − 1 O + 2 qV 0 (Λ) D O exp 2 q Tr • Can one obtain a large N estimate of the HCIZ integral � � � N N →∞ N − 2 ln 2 q Tr AO † BO F ( ρ A , ρ B ) = lim D O exp in terms of the spectrum of A and B ? J.-P. Bouchaud

The general case: HCIZ integrals • Can one obtain a large N estimate of the HCIZ integral � � � N N →∞ N − 2 ln 2 q Tr AO † BO F ( ρ A , ρ B ) = lim D O exp in terms of the spectrum of A and B ? • When A (or B ) is of finite rank, such a formula exists in terms of the R -transform of B [Marinari, Parisi & Ritort, 1995]. • When the rank of A , B are of order N , there is a formula due to Matytsin [94](in the unitary case), later shown rigorously by Zeitouni & Guionnet, but its derivation is quite obscure... J.-P. Bouchaud

An instanton approach to large N HCIZ • Consider Dyson’s Brownian motion matrices. The eigenvalues obey: � βN d W + 1 2 � 1 d x i = N d t , x i − x j j � = i • Constrain x i ( t = 0) = λ Ai and x i ( t = 1) = λ Bi . The proba- bility of such a path is given by a large deviation/instanton formula, with: d 2 x i dt 2 = − 2 � 1 ( x i − x ℓ ) 3 . N 2 ℓ � = i J.-P. Bouchaud

An instanton approach to large N HCIZ • Constrain x i ( t = 0) = λ Ai and x i ( t = 1) = λ Bi . The proba- bility of such a path is given by a large deviation/instanton formula, with: d 2 x i dt 2 = − 2 1 � ( x i − x ℓ ) 3 . N 2 ℓ � = i • This can be interpreted as the motion of particles interacting through an attractive two-body potential φ ( r ) = − ( Nr ) − 2 . Using the virial formula, one finally gets Matytsin’s equations: ∂ t v + v∂ x v = π 2 ρ∂ x ρ. ∂ t ρ + ∂ x [ ρv ] = 0 , J.-P. Bouchaud

An instanton approach to large N HCIZ • Finally, the “action” associated to these trajectories is: � � �� Z = B � v 2 + π 2 S ≈ 1 − 1 3 ρ 2 d xρ d x d yρ Z ( x ) ρ Z ( y ) ln | x − y | 2 2 Z = A • Now, the link with HCIZ comes from noticing that the prop- agator of the Brownian motion in matrix space is: P ( B | A ) ∝ exp − [ N 2 Tr( A − B ) 2 ] = exp − N 2 [Tr A 2 +Tr B 2 − 2Tr AOBO † ] Disregarding the eigenvectors of B (i.e. integrating over O ) leads to another expression for P ( λ Bi | λ Aj ) in terms of HCIZ that can be compared to the one using instantons • The final result for F ( ρ A , ρ B ) is exactly Matytsin’s expression, up to details (!) J.-P. Bouchaud

Back to eigenvalue cleaning... • Estimating HCIZ at large N is only the first step, but... • ...one still needs to apply it to B = C − 1 , A = E = X † C X and to compute also correlation functions such as � O 2 ij � E → C − 1 with the HCIZ weight • As we were working on this we discovered the work of Ledoit- P´ ech´ e that solves the problem exactly using tools from RMT... J.-P. Bouchaud

Cleaning correlation matrices, Random Matrix Theory & HCIZ - PowerPoint PPT Presentation

Cleaning correlation matrices, Random Matrix Theory & HCIZ integrals J.P Bouchaud with: M. Potters, L. Laloux, R. Allez, J. Bun, S. Majumdar http://www.cfm.fr Portfolio theory: Basics Portfolio weights w i , Asset returns X t i If

Workflow basics, RMarkdown, git/Github Cleaning up Cleaning up Cleaning up Cleaning up

Floor Cleaning By Vacuum After vacuum Cleaning After vacuum Cleaning After vacuum Cleaning

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Correlation Course Title Correlation Correlation coe ffi cient between -1 and 1 Sign

Diagnose data for cleaning Cleaning Data in Python Cleaning data Prepare data for analysis

Random Vectors, Random Matrices, and Matrix Expected Value James H. Steiger Department of

Data Cleaning Nan Tang, QCRI Big Data Cleaning Nan Tang, QCRI Big Data Cleaning Nan Tang,

Theory of correlation transfer and correlation structure in recurrent networks Ruben Moreno-Bote

Operator limits of random matrices I. Stochastic Airy Brian Rider Temple University Brian Rider

MODEL VIEW AND PERSPECTIVE MATRICES OUTLINE The Three Main Matrices Model Matrix View

Block and Triangular Matrices Block Matrices Defn. A partitioned matrix has the rows and columns

Equipment Cleaning for Drug Products www.gmpsop.com 1 Scope and General Types of Cleaning

Building Cleaning Services (BCS) Kirsty Thomas, Assistant Building Cleaning Finance officer

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

A Tale of Two Theories: A Tale of Two Theories: Reconciling Reconciling random matrix theory

Animesh Mukherjee Department of Computer Science & Engineering, Indian Institute of

Robust Bayesian Truth Serum Presentation by Mark Bun and Bo Waggoner 2012-12-03 Presentation by

Simultaneous Private Learning of Mul4ple Concepts January 16,

Demystifying Design or how I came to hate unicorns Design is a gift. Rubbish. Designers

BREAKFAST Breakfast Sandwich | one fresh egg, cheddar jack cheese | choice of ham, bacon, or

Principal Bundles and Reciprocity Laws Minhyong Kim Simons Symposium, May, 2017 Some background

Tangent categories are locally Cartesian differential categories J.R.B. Cockett Department of

Bridging Privacy Definitions: Differential Privacy and Concepts from Privacy Law & Policy

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Cleaning correlation matrices, Random Matrix Theory & HCIZ - PowerPoint PPT Presentation

Cleaning correlation matrices, Random Matrix Theory & HCIZ integrals J.P Bouchaud with: M. Potters, L. Laloux, R. Allez, J. Bun, S. Majumdar http://www.cfm.fr Portfolio theory: Basics Portfolio weights w i , Asset returns X t i If

Workflow basics, RMarkdown, git/Github Cleaning up Cleaning up Cleaning up Cleaning up

Floor Cleaning By Vacuum After vacuum Cleaning After vacuum Cleaning After vacuum Cleaning

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Correlation Course Title Correlation Correlation coe ffi cient between -1 and 1 Sign

Diagnose data for cleaning Cleaning Data in Python Cleaning data Prepare data for analysis

Random Vectors, Random Matrices, and Matrix Expected Value James H. Steiger Department of

Data Cleaning Nan Tang, QCRI Big Data Cleaning Nan Tang, QCRI Big Data Cleaning Nan Tang,

Theory of correlation transfer and correlation structure in recurrent networks Ruben Moreno-Bote

Operator limits of random matrices I. Stochastic Airy Brian Rider Temple University Brian Rider

MODEL VIEW AND PERSPECTIVE MATRICES OUTLINE The Three Main Matrices Model Matrix View

Block and Triangular Matrices Block Matrices Defn. A partitioned matrix has the rows and columns

Equipment Cleaning for Drug Products www.gmpsop.com 1 Scope and General Types of Cleaning

Building Cleaning Services (BCS) Kirsty Thomas, Assistant Building Cleaning Finance officer

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices &amp; quadratic forms)

A Tale of Two Theories: A Tale of Two Theories: Reconciling Reconciling random matrix theory

Animesh Mukherjee Department of Computer Science &amp; Engineering, Indian Institute of

Robust Bayesian Truth Serum Presentation by Mark Bun and Bo Waggoner 2012-12-03 Presentation by

Simultaneous Private Learning of Mul4ple Concepts January 16,

Demystifying Design or how I came to hate unicorns Design is a gift. Rubbish. Designers

BREAKFAST Breakfast Sandwich | one fresh egg, cheddar jack cheese | choice of ham, bacon, or

Principal Bundles and Reciprocity Laws Minhyong Kim Simons Symposium, May, 2017 Some background

Tangent categories are locally Cartesian differential categories J.R.B. Cockett Department of

Bridging Privacy Definitions: Differential Privacy and Concepts from Privacy Law &amp; Policy

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

Animesh Mukherjee Department of Computer Science & Engineering, Indian Institute of

Bridging Privacy Definitions: Differential Privacy and Concepts from Privacy Law & Policy