On the sample complexity of graph selection: Practical methods and - PowerPoint PPT Presentation

On the sample complexity of graph selection: Practical methods and fundamental limits Martin Wainwright UC Berkeley Departments of Statistics, and EECS Based on joint work with: John Lafferty (CMU) Pradeep Ravikumar (UT Austin) Prasad Santhanam (Univ. Hawaii) Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 1 / 27

Introduction Markov random fields (undirected graphical models): central to many applications in science and engineering: ◮ communication, coding, information theory, networking ◮ machine learning and statistics ◮ computer vision; image processing ◮ statistical physics ◮ bioinformatics, computational biology ... Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 2 / 27

Introduction Markov random fields (undirected graphical models): central to many applications in science and engineering: ◮ communication, coding, information theory, networking ◮ machine learning and statistics ◮ computer vision; image processing ◮ statistical physics ◮ bioinformatics, computational biology ... some core computational problems ◮ counting/integrating: computing marginal distributions and data likelihoods ◮ optimization: computing most probable configurations (or top M -configurations) ◮ model selection: fitting and selecting models on the basis of data Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 2 / 27

What are graphical models? Markov random field: random vector ( X 1 , . . . , X p ) with distribution factoring according to a graph G = ( V, E ): D A B C Hammersley-Clifford Theorem: ( X 1 , . . . , X p ) being Markov w.r.t G implies factorization over graph cliques studied/used in various fields: spatial statistics, language modeling, computational biology, computer vision, statistical physics .... Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 3 / 27

Graphical model selection let G = ( V, E ) be an undirected graph on p = | V | vertices Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 4 / 27

Graphical model selection let G = ( V, E ) be an undirected graph on p = | V | vertices pairwise Markov random field: family of prob. distributions � � � 1 P ( x 1 , . . . , x p ; θ ) = Z ( θ ) exp � θ st , φ st ( x s , x t ) � . ( s,t ) ∈ E Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 4 / 27

Graphical model selection let G = ( V, E ) be an undirected graph on p = | V | vertices pairwise Markov random field: family of prob. distributions � � � 1 P ( x 1 , . . . , x p ; θ ) = Z ( θ ) exp � θ st , φ st ( x s , x t ) � . ( s,t ) ∈ E Problem of graph selection: given n independent and identically distributed (i.i.d.) samples of X = ( X 1 , . . . , X p ), identify the underlying graph structure Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 4 / 27

Graphical model selection let G = ( V, E ) be an undirected graph on p = | V | vertices pairwise Markov random field: family of prob. distributions � � � 1 P ( x 1 , . . . , x p ; θ ) = Z ( θ ) exp � θ st , φ st ( x s , x t ) � . ( s,t ) ∈ E Problem of graph selection: given n independent and identically distributed (i.i.d.) samples of X = ( X 1 , . . . , X p ), identify the underlying graph structure complexity constraint: restrict to subset G d,p of graphs with maximum degree d Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 4 / 27

Illustration: Voting behavior of US senators Graphical model fit to voting records of US senators (Bannerjee, El Ghaoui, & d’Aspremont, 2008)

Outline of remainder of talk 1 Background and past work 2 A practical scheme for graphical model selection (a) ℓ 1 -regularized neighborhood regression (b) High-dimensional analysis and phase transitions 3 Fundamental limits of graphical model selection (a) An unorthodox channel coding problem (b) Necessary conditions (c) Sufficient conditions (optimal algorithms) 4 Various open questions...... Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 6 / 27

Previous/on-going work on graph selection methods for Gaussian MRFs ◮ ℓ 1 -regularized neighborhood regression for Gaussian MRFs (e.g., Meinshausen & Buhlmann, 2005; Wainwright, 2006, Zhao, 2006) ◮ ℓ 1 -regularized log-determinant (e.g., Yuan & Lin, 2006; d’Aspr´ emont et al., 2007; Friedman, 2008; Ravikumar et al., 2008)

Previous/on-going work on graph selection methods for Gaussian MRFs ◮ ℓ 1 -regularized neighborhood regression for Gaussian MRFs (e.g., Meinshausen & Buhlmann, 2005; Wainwright, 2006, Zhao, 2006) ◮ ℓ 1 -regularized log-determinant (e.g., Yuan & Lin, 2006; d’Aspr´ emont et al., 2007; Friedman, 2008; Ravikumar et al., 2008) methods for discrete MRFs ◮ exact solution for trees (Chow & Liu, 1967) ◮ local testing (e.g., Spirtes et al, 2000; Kalisch & Buhlmann, 2008) ◮ distribution fits by KL-divergence (Abeel et al., 2005) ◮ ℓ 1 -regularized logistic regression (Ravikumar, W. & Lafferty et al., 2006, 2008) ◮ approximate max. entropy approach and thinned graphical models (Johnson et al., 2007) ◮ neighborhood-based thresholding method (Bresler, Mossel & Sly, 2008)

Previous/on-going work on graph selection methods for Gaussian MRFs ◮ ℓ 1 -regularized neighborhood regression for Gaussian MRFs (e.g., Meinshausen & Buhlmann, 2005; Wainwright, 2006, Zhao, 2006) ◮ ℓ 1 -regularized log-determinant (e.g., Yuan & Lin, 2006; d’Aspr´ emont et al., 2007; Friedman, 2008; Ravikumar et al., 2008) methods for discrete MRFs ◮ exact solution for trees (Chow & Liu, 1967) ◮ local testing (e.g., Spirtes et al, 2000; Kalisch & Buhlmann, 2008) ◮ distribution fits by KL-divergence (Abeel et al., 2005) ◮ ℓ 1 -regularized logistic regression (Ravikumar, W. & Lafferty et al., 2006, 2008) ◮ approximate max. entropy approach and thinned graphical models (Johnson et al., 2007) ◮ neighborhood-based thresholding method (Bresler, Mossel & Sly, 2008) information-theoretic analysis ◮ pseudolikelihood and BIC criterion (Csiszar & Talata, 2006) ◮ information-theoretic limitations (Santhanam & W., 2008)

High-dimensional analysis classical analysis: dimension p fixed, sample size n → + ∞ high-dimensional analysis: allow both dimension p , sample size n , and maximum degree d to increase at arbitrary rates take n i.i.d. samples from MRF defined by G p,d study probability of success as a function of three parameters: Success( n, p, d ) = P [Method recovers graph G p,d from n samples] theory is non-asymptotic: explicit probabilities for finite ( n, p, d )

Some challenges in distinguishing graphs clearly, a lower bound on the minimum edge weight is required: ( s,t ) ∈ E | θ ∗ min st | ≥ θ min , although θ min ( p, d ) = o (1) is allowed. in contrast to other testing/detection problems, large | θ st | also problematic

Some challenges in distinguishing graphs clearly, a lower bound on the minimum edge weight is required: ( s,t ) ∈ E | θ ∗ min st | ≥ θ min , although θ min ( p, d ) = o (1) is allowed. in contrast to other testing/detection problems, large | θ st | also problematic Toy example: Graphs from G 3 , 2 (i.e., p = 3; d = 2) θ θ θ θ θ θ As θ increases, all three Markov random fields become arbitrarily close to: � if x ∈ { ( − 1) 3 , (+1) 3 } 1 / 2 P ( x 1 , x 2 , x 3 ) = 0 otherwise.

Markov property and neighborhood structure Markov properties encode neighborhood structure: d ( X s | X V \ s ) = ( X s | X N ( s ) ) � �� Condition on full graph Condition on Markov blanket N ( s ) = { s, t, u, v, w } X s X t X w X s X u X v basis of pseudolikelihood method (Besag, 1974) used for Gaussian model selection (Meinshausen & Buhlmann, 2006) Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 10 / 27

§ 2. Practical method via neighborhood regression Observation: Recovering graph G equivalent to recovering neighborhood set N ( s ) for all s ∈ V . Method: Given n i.i.d. samples { X (1) , . . . , X ( n ) } , perform logistic regression of each node X s on X \ s := { X s , t � = s } to estimate neighborhood structure b N ( s ). 1 For each node s ∈ V , perform ℓ 1 regularized logistic regression of X s on the remaining variables X \ s : ( ) X n 1 b f ( θ ; X ( i ) θ [ s ] := arg min \ s ) + ρ n � θ � 1 n |{z} θ ∈ R p − 1 | {z } i =1 logistic likelihood regularization 2 Estimate the local neighborhood b N ( s ) as the support (non-negative entries) of the regression vector b θ [ s ]. 3 Combine the neighborhood estimates in a consistent manner (AND, or OR rule). Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 11 / 27

Empirical behavior: Unrescaled plots Star graph; Linear fraction neighbors 1 0.8 Prob. success 0.6 0.4 0.2 p = 64 p = 100 p = 225 0 0 100 200 300 400 500 600 Number of samples Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 12 / 27

Empirical behavior: Appropriately rescaled Star graph; Linear fraction neighbors 1 0.8 Prob. success 0.6 0.4 0.2 p = 64 p = 100 p = 225 0 0 0.5 1 1.5 2 Control parameter Martin Wainwright (UC Berkeley) High-dimensional graph selection August 2009 13 / 27 Plots of success probability versus control parameter θ ( n, p, d ).

On the sample complexity of graph selection: Practical methods and - PowerPoint PPT Presentation

On the sample complexity of graph selection: Practical methods and fundamental limits Martin Wainwright UC Berkeley Departments of Statistics, and EECS Based on joint work with: John Lafferty (CMU) Pradeep Ravikumar (UT Austin) Prasad

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

Agglomeration of Ash Particles due to Flue Gas Conditioning (a) Sample CA8S12F1 (b) Sample

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Split clique graph complexity L. Alcn and M. Gutierrez La Plata, Argentina L. Faria and C. M.

n e CC Sample Selection for the Near Detector CDR Tanaz Angelina Mohayai MPD Meeting Oct. 29,

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

SEM Photographs of Activated ash samples SEM Micrographs (Original ash samples) (a) Sample S1F1

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Practical Experience with Practical Experience with Practical Experience with Practical

Automatic Sample-by- sample Model Selection Between Two Off-the-shelf Classifiers Steve P.

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

This Unit CPU performance equation App App App Clock vs CPI System software CIS 371

What, Why, and Who of Alternative Credentials Martin Kurzweil May 7, 2019 EWA National

Schwiegelshohns Proof of the Kawaguchi-Kyan Bound Martin Skutella TU Berlin Single Machine

Food Access and Health Clark County Commission on Aging Melissa Martin, MPH June 19, 2019 Clark

Sierpiski carpet as a Martin boundary Stefan Kohl 6th Cornell Conference on Analysis,

Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat

NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the

Jason Martin, EdD Associate Dean, Walker Library Middle Tennessee State University

On the sample complexity of graph selection: Practical methods and - PowerPoint PPT Presentation

On the sample complexity of graph selection: Practical methods and fundamental limits Martin Wainwright UC Berkeley Departments of Statistics, and EECS Based on joint work with: John Lafferty (CMU) Pradeep Ravikumar (UT Austin) Prasad

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

Agglomeration of Ash Particles due to Flue Gas Conditioning (a) Sample CA8S12F1 (b) Sample

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Split clique graph complexity L. Alcn and M. Gutierrez La Plata, Argentina L. Faria and C. M.

n e CC Sample Selection for the Near Detector CDR Tanaz Angelina Mohayai MPD Meeting Oct. 29,

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

SEM Photographs of Activated ash samples SEM Micrographs (Original ash samples) (a) Sample S1F1

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Practical Experience with Practical Experience with Practical Experience with Practical

Automatic Sample-by- sample Model Selection Between Two Off-the-shelf Classifiers Steve P.

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

This Unit CPU performance equation App App App Clock vs CPI System software CIS 371

What, Why, and Who of Alternative Credentials Martin Kurzweil May 7, 2019 EWA National

Schwiegelshohns Proof of the Kawaguchi-Kyan Bound Martin Skutella TU Berlin Single Machine

Food Access and Health Clark County Commission on Aging Melissa Martin, MPH June 19, 2019 Clark

Sierpiski carpet as a Martin boundary Stefan Kohl 6th Cornell Conference on Analysis,

Device Assignment for VMs in Kubernetes Martin Polednik (@mpolednik) Software Engineer @ Red Hat

NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the

Jason Martin, EdD Associate Dean, Walker Library Middle Tennessee State University

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?