Statistics via State Evolution Mohsen Bayati Stanford University - PowerPoint PPT Presentation

Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari

Statistical learning motivations

Data  Prediction • Online advertising: – Predict probability of click on an ad • Healthcare – Predict occurrence of diabetes • Finance – Predict change in stock prices

Formulation • Patient record i : • Given n records: • Posit a linear model: • Goal: find a good

Massive amounts of measurements Electronic health records  many Traditional clinical decision making, based on few important measurements cheap measurements ( small p ) ( large p ) Location Monitoring tracking Labs MD exam Real-time Personalized vital signs Medications medicine Nurse Smartphone observation apps Radiology Testing for Genomic wellness data

How to use more measurements? • Standard least square  Many solutions – Most solutions are poor for future outcomes (due to noise) • Main problem: For large p find few important measurements • Infer a sparse

Learning recipe • Define a loss function: • Example: least square or Gaussian noise: • Estimate by

NP-hard problem • Estimate by

Convex relaxation (LASSO) • Estimate by – Tibshirani’96, Chen- Donoho’95 – Automatically selects few important measurements

Model selection High (small ) Low (large ) Source: Elements of Statistical Learning, Hastie et al 2009

Mathematical questions

Characteristics of the solution What performance should we expect? – What is MSE or for each ? – How to choose the best ?

Growing theory on LASSO • Zhao, Yu (2006) • Candes, Romberg, Tao (2006) • Candes, Tao (2007) • Bunea, Tsybakov, Wegkamp (2007) • Bickel, Ritov, Tsybakov (2009) • Buhlmann, van de Geer (2009) • Zhang (2009) • Meinshausen, Yu (2009) • Wainwright (2009, 2011) • Talagrand (2010) • Belloni, Chernozhukov et al (2009-13) • Maleki et al (2011) • Bickel et al (2012 ) • There are many more but not listed due to space limitations.

General random convex functions • Consider • Let A be a Gaussian matrix • Let be strictly convex or • Talagrand’10: Finds generic properties of the minimizer , in particular MSE can be calculated, when certain replica symmetric equations have solutions • Chapter 3 of Mean Field Models for Spin Glasses Vol 1 • Does not apply to our case with

Some intuition: scalar case ( p=n=1 ) • For • LASSO estimate is • Simple calculus: • Then MSE is – With independent

Main result • Theorem (Bayati-Montanari): For and – Like the scalar case with – Where:

Main result • Theorem (Bayati-Montanari): For and – In fact we prove: • Problem asymptotically decouples to p scalar sub-problems with increased Gaussian noise – And we find a formula for the noise

Main result (general case) • Theorem (Bayati-Montanari): For and – In fact we prove: • Problem asymptotically decouples to p scalar sub-problems with increased Gaussian noise

Main result (general case) • Theorem (Bayati-Montanari): For and – Note: There is strong empirical and theoretical evidence that Gaussian assumption on A is not necessary • Donoho-Maleki- Montanari’09 • Bayati-Lelarge- Montanari’12 -13

Algorithmic Analysis

Proof strategy 1. Construct a sequence in that 2. Show that – With

Start with Belief Propagation 1. Gibbs measure 2. Write cavity (belief propagation) equations at 3. Each message is a probability distribution with mean satisfying

AMP algorithm: derivation 1. Gibbs measure 2. Message-passing (MP) algorithm 3. Look at first order approximation of the messages (AMP)

AMP algorithm • Approximate message passing (AMP) – Donoho , Maleki, Montanari’09 Onsager reaction term.

AMP and compressed sensing • AMP was originally designed to solve this problem: • This is equivalent to LASSO solution when • In this case assumption is that x 0 is the solution to the above optimization when L 1 norm is replaced with L 0 norm

Phase transition line and algorithms Source: Arian Maleki’s PhD Thesis

AMP algorithm • Approximate message passing (AMP) – Donoho , Maleki, Montanari’09 • For Gaussian A as it converges and accuracy can be achieved after iterations – Bayati, Montanari’12

Main steps of the proof 1. We use a conditioning technique (due to Bolthausen) to prove: – Where 2. 3. Therefore algorithm’s estimate satisfies the main claim:

Recall the main result • Theorem (Bayati-Montanari): For and • Problem: The right hand side requires knowledge of . What can be done when we do not have that?

Objective • Recall the problem: • Given , construct estimator for MSE and (*) • So far we used the knowledge of noise and distribution of x 0 which is not realistic. • Next, we’ll demonstrate how to solve (*).

Recipe (Columns of A are iid) 1. Let 2. Define pseudo-data 3. The estimators are where

Main Result Theorem (Bayati-Erdogdu- Montanari’13) For and • For correlated columns we have a similar (non-rigorous) formula that relies on a conjecture based on replica method due to Javanmard- Montanari’13.

Sketch of the proof – where • Using Stein’s SURE estimate:

MSE Estimation (iid Gaussian Data)

MSE Estimation (Correlated Gaussian Data) Relies on a replica method conjecture

Comparison with noise estimation methods • Belloni, Chernuzhukov (2009) • Fan, Guo, Hao (2010) • Sun, Zhang (2010) • Zhang (2010) • Städler, Bühlmann, van de Geer (2010, 2012)

Noise Estimation (iid Gaussian Data)

Noise Estimation (Correlated Gaussian Data)

Extensions to general random matrices

Recall AMP and MP algorithm 1. Gibbs measure 2. Message-passing algorithm 3. Look at first order approximation of the messages.

General random matrices (i.n.i.d.) Theorem (Bayati-Lelarge- Montanari’12 ) 1) As , finite marginals of are asymptotically insensitive to the distribution of with sub-exponential tail. 2) The entries are asymptotically Gaussian with zero mean, and variance that can be calculated by a one dimensional equation.

Main steps of the proof Step 1: AMP is asymptotically equivalent to its belief propagation (MP) counterpart (w.l.o.g. assume A is symmetric) AMP MP

Main steps of the proof Step 2: MP messages are summation over non-backtracking trees Example: If i l a b

Main steps of the proof Step 2: (continued)

Main steps of the proof Step 2: (continued) Each edge is repeated twice Converges to 0 as p grows First term is independent of the distribution and only depends on the second moments.

Extensions and open directions • Setting: • General distribution on A, o ther cost functions/regularizers • Promising progress: – Rangan et al ’10 -12 – Schniter et al’10 -12 – Donoho-Johnston- Montanari’11 – Maleki et al’11 – Krzakala-Mézard-Sausset-Sun-Zdeborová 11-12 – Bean-Bickel-El Karoui- Yu’12 – Bayati-Lelarge- Montanari’12 – Javanmard- Montanari’12,13 – Kabashima et al ‘12 -14 – Manoel-Krzakala-Tramel-Zdeborová ‘14 – Caltagirone-Krzakala-Zdeborová ’14 – Schülke-Caltagirone-Zdeborová ’14

Thank you!

Statistics via State Evolution Mohsen Bayati Stanford University - PowerPoint PPT Presentation

Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical learning motivations Data Prediction

EVOLUTION X3 - 1 - Evolution X3 Marketing Dpt. November 2006 - 2 - EVOLUTION X3 Evolution X3

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Evolution of valley depth and width Evolution of valley depth and width Evolution of valley depth

Lecture 1 Chapter 9 Software evolution 1 Topics covered Evolution processes Change

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

EVOLUTION Its a Family Affair TODAYS LESSON Diversity and Evolution of Living Organisms

EVOLUTION Paper 2: 66 marks THEORIES OF EVOLUTION EVOLUTION : Change over Time Compiled by

Technology Evolution Technology Focused Evolution Architectural Changes Impact on

Science Evolution and Inheritance Year One Science | Year 6 | Evolution and Inheritance | Theory

Meta-Evolution Style for Software Architecture Evolution lah Ad Adel Ha Hassan n and Mourad d

Rehabilitation Consequences of Road Collisions ine Carroll Evolution Evolution Evolution

1. Evolution and Classification 1.1 Origin of Life and Plants 1.2 Animal Evolution 1.3 Human

Models of Language Evolution models thereof its evolution language Models of Language Evolution

Evolution Change over time but what is the process? Evolution: Change through time

The Generalized Theories of Evolution Why it is the Theory of Evolution that is Constantly

One Step Mutation (OSM) matrices joint work with Sequence Evolution 1 Sequence Evolution

Message Passing Concepts Message Passing Model The message passing model is based on the

Introduction to Human Computer Interaction Course on NPTEL, Spring 2018 Week 7 Usable Security

The evolution of the Tclers Wiki Jos Decoster - jos.decoster@gmail.com Steve Landers -

Members Day Breakfast Briefing: New members Saturday 29 March: 8:00 am Park & Ascot

Introduction to Parallel Computing Irene Moulitsas Programming using the Message-Passing

Belief Propagation Matt Gormley Lecture 9 Sep. 25, 2019 1 Q&A Q: What if I already

An empirical study of messaging passing concurrency in Go projects Nicolas Dilley Julien Lange

Optimization Nicholas Ruozzi Advisor: Sekhar Tatikonda Yale University 1 The Problem

Statistics via State Evolution Mohsen Bayati Stanford University - PowerPoint PPT Presentation

Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical learning motivations Data Prediction

EVOLUTION X3 - 1 - Evolution X3 Marketing Dpt. November 2006 - 2 - EVOLUTION X3 Evolution X3

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Evolution of valley depth and width Evolution of valley depth and width Evolution of valley depth

Lecture 1 Chapter 9 Software evolution 1 Topics covered Evolution processes Change

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

EVOLUTION Its a Family Affair TODAYS LESSON Diversity and Evolution of Living Organisms

EVOLUTION Paper 2: 66 marks THEORIES OF EVOLUTION EVOLUTION : Change over Time Compiled by

Technology Evolution Technology Focused Evolution Architectural Changes Impact on

Science Evolution and Inheritance Year One Science | Year 6 | Evolution and Inheritance | Theory

Meta-Evolution Style for Software Architecture Evolution lah Ad Adel Ha Hassan n and Mourad d

Rehabilitation Consequences of Road Collisions ine Carroll Evolution Evolution Evolution

1. Evolution and Classification 1.1 Origin of Life and Plants 1.2 Animal Evolution 1.3 Human

Models of Language Evolution models thereof its evolution language Models of Language Evolution

Evolution Change over time but what is the process? Evolution: Change through time

The Generalized Theories of Evolution Why it is the Theory of Evolution that is Constantly

One Step Mutation (OSM) matrices joint work with Sequence Evolution 1 Sequence Evolution

Message Passing Concepts Message Passing Model The message passing model is based on the

Introduction to Human Computer Interaction Course on NPTEL, Spring 2018 Week 7 Usable Security

The evolution of the Tclers Wiki Jos Decoster - jos.decoster@gmail.com Steve Landers -

Members Day Breakfast Briefing: New members Saturday 29 March: 8:00 am Park &amp; Ascot

Introduction to Parallel Computing Irene Moulitsas Programming using the Message-Passing

Belief Propagation Matt Gormley Lecture 9 Sep. 25, 2019 1 Q&amp;A Q: What if I already

An empirical study of messaging passing concurrency in Go projects Nicolas Dilley Julien Lange

Optimization Nicholas Ruozzi Advisor: Sekhar Tatikonda Yale University 1 The Problem

Members Day Breakfast Briefing: New members Saturday 29 March: 8:00 am Park & Ascot

Belief Propagation Matt Gormley Lecture 9 Sep. 25, 2019 1 Q&A Q: What if I already