MCMC based machine learning a . (Bayesian Model Averaging) Nicos - PowerPoint PPT Presentation

MCMC based machine learning a . (Bayesian Model Averaging) Nicos Angelopoulos n.angelopoulos@ed.ac.uk School of Biological Sciences Biochemistry Group University of Edinburgh, Scotland, UK. a Collaborative work with James Cussens, York University, jc@cs.york.ac.uk P .E.L. 2006 – p.1

MCMC Overview Class of sampling algorithms that estimate a posterior distribution. Markov chain construct a chain of visited values, M 1 , M 2 , . . . , M n , by proposing M ∗ from M i , with probability q ( M ∗ , M i ) . Use prior knowledge, p ( M ∗ ) and relative likelihood of the two values, p ( D | M ∗ ) /p ( D | M i ) to decide chain construction. Monte Carlo Use the chain to approximate the posterior p ( M | D ) . P .E.L. 2006 – p.2

Bayesian learning with MCMC Given some data D and a class of statistical models M ( M ∈ M ) that can express relations in the data, use MCMC to approximate normalisation factor in Bayes’ theorem p ( D | M ) p ( M ) p ( M | D ) = � M p ( D | M ) p ( M ) p ( M ) is the prior probability of each model p ( D | M ) the likelihood (how well the model fits the data) p ( M | D ) the posterior P .E.L. 2006 – p.3

Example: Data smoker bronchitis l_cancer person 1 y y n person 2 y n n person 3 y y y person 4 n y n person 5 n n n P .E.L. 2006 – p.4

Example: Models S B 1 B L [b-[],l-[],s-[]] S B 2 B L [b-[s],l-[],s-[]] . . . S B 24 [b-[s],l-[b,s],s-[]] B L P .E.L. 2006 – p.5

Example: Objective P(Bx) . . . B1 B2 B3 B4 . . . B24 � p ( B x ) = 1 B x P .E.L. 2006 – p.6

Metropolis-Hastings (M-H) MCMC 0. Set i = 0 and find M 0 using the prior. 1. From M i produce a candidate model M ∗ . Let the probability of reaching M ∗ be q ( M ∗ , M i ) . 2. Let � q ( M ∗ , M i ) P ( D | M ∗ ) P ( M ∗ ) � α ( M i , M ∗ ) = min q ( M i , M ∗ ) P ( D | M i ) P ( M i ) , 1 � M ∗ with probability α ( M i , M ∗ ) M i +1 = M i with probability 1 − α ( M i , M ∗ ) 3. If i reached limit then terminate, else set i = i + 1 and repeat from 1. P .E.L. 2006 – p.7

Example: MCMC Markov Chain: M 1 B 3 P .E.L. 2006 – p.8

Example: MCMC Markov Chain: M 1 , M 2 B 3 , B 3 P .E.L. 2006 – p.8

Example: MCMC Markov Chain: M 1 , M 2 , M 3 , M 4 , M 5 , . . . B 3 , B 3 , B 10 , B 3 , B 24 , . . . P .E.L. 2006 – p.8

Example: MCMC Markov Chain: M 1 , M 2 , M 3 , M 4 , M 5 , . . . B 3 , B 3 , B 10 , B 3 , B 24 , . . . Monte Carlo: #( B k ) p ( B k ) = � B x #( B x ) P .E.L. 2006 – p.8

SLP defined model space ?− bn( [1,2,3], Bn ). G0 Gi Mi M* From M i identify G i then sample forward to M ⋆ . q ( M i , M ⋆ ) is the probability of proposing M ⋆ when M i is the current model. P .E.L. 2006 – p.9

BN Prior bn( OrdNodes, Bn ) :- bn( Nodes, [], Bn ). bn( [], _PotPar, [] ). bn( [H|T], PotPar, [H-SelParOfH|RemBn] ) :- select_parents( PotPar, H, SelParOfH ), bn( T, [H|PotPar], RemBn ). select_parents( [], [] ). select_parents( [H|T], Pa ) :- include_element( H, Pa, RemPa ), select_parents( T, TPa ). 1/2 : include_element( H, [H|TPa], TPa ). 1/2 : include_element( _H, TPa, TPa ). P .E.L. 2006 – p.10

example BN (Asia) For example ? - bn( [1,2,3,4,5,6,7,8], M ). M = [1-[],2-[1],3-[2,5],4-[],5-[4],6-[4],7-[3],8-[3,6]]. P .E.L. 2006 – p.11

visits and stays P .E.L. 2006 – p.12

Edges recovery With topological ordering constraint and a maximum of 2 parents per node, the algorithm recovers most of the BN arcs in 0.5 M iterations. For example for a .99 cut -off we have : Missing : 2 → 3 (.84) 3 → 7 (.47) Superfluous : 5 → 7 P .E.L. 2006 – p.13

CART priors ? - cart( M ). x2 P split ( η ) = α (1 + d η ) − β =< 1 1 < x1 =< 0 0 < M = node( b, 1, node(a,0,leaf,leaf), leaf ) 1 - Sp: [Sp]: cart( Data, D, A/B, leaf(Data) ). Sp: [Sp]: cart( Data, D, A/B, node(F,V,L,R) ) :- branch( Data, F, V, LData, RData ), D1 is D + 1, NxtSp is A * ((1 + D1) ˆ -B), [NxtSp] : cart( LData, D1, A/B, L ), [NxtSp] : cart( RData, D1, A/B, R ). P .E.L. 2006 – p.14

Experiment Pima Indians Diabetes Database 768 complete entries of 8 variables. Denison et.al. run 250,000 iterations of local perturbations. Their best likelihood model: -343.056 Our experiment run for 250,000 iterations with branch replacing. Parameters: uniform choice proposal, α = . 95 β = . 8 Our best likelihood model: -347.651 P .E.L. 2006 – p.15

Likelihoods trace -340 ’tr_uc_rm_pima_idsd_a0_95b0_8_i250K__s776.llhoods’ -350 -360 -370 -380 -390 -400 -410 -420 0 50000 100000 150000 200000 250000 β = . 8 , α = . 95 , proposal = uniform choice P .E.L. 2006 – p.16

-347.61529077520584 Best likelihood 1:6 best_llhood:vst(37):msclf(145) =< 29.3 >29.3 2:8 11:2 =< 27 >27 =< 166 >166 3:141/5 4:6 12:8 31:7/62 =< 26.3 >26.3 =< 29 >29 5:2 8:8 13:2 28:2 =< 152 >152 =< 54 >54 =< 127 >127 =< 106 >106 6:54/3 7:2/9 9:22/24 10:10/1 14:6 17:3 29:52/19 30:53/92 =< 45.4 >45.4 =< 61 >61 15:129/19 16:1/4 18:1/11 19:8 =< 27 >27 20:5 27:7/5 =< 200 >200 21:7 26:12/1 =< 0.314 >0.314 22:4/1 23:5 =< 32 >32 24:4/6 25:1/6 P .E.L. 2006 – p.17

in Kyoto Models: HMRFs for clustering. Likelihood: design and implement a likelihood -ratio function for HMRFs. Proposal: implement function(s) for reaching proposal model. Application: to real data. SLPs: for more complex priors. P .E.L. 2006 – p.18

MCMC based machine learning a . (Bayesian Model Averaging) Nicos - PowerPoint PPT Presentation

MCMC based machine learning a . (Bayesian Model Averaging) Nicos Angelopoulos n.angelopoulos@ed.ac.uk School of Biological Sciences Biochemistry Group University of Edinburgh, Scotland, UK. a Collaborative work with James Cussens, York

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

Testing MCMC Samplers Jason M.T. Roos First European Bayesian Summit in Marketing Testing MCMC

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

Network determination based on birth-death MCMC inference A. Mohammadi and E. Wit February 4,

MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1

MCMC for Cut Models or Chasing a Moving Target with MCMC Martyn Plummer International Agency

Modern Computational Statistics Lecture 8: Advanced MCMC Cheng Zhang School of Mathematical

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Introduction to MCMC and BUGS Basic recipes, and a sample of some techniques for getting

FOR MCMC OLD HEADQUARTER CONFIDENTIAL BACKGROUND Existing MCMC Old HQ building is occupying

STAT 339 Markov Chain Monte Carlo (MCMC) 7 April 2017 Some theory and intuition about MCMC

Parallel tempering and Interacting MCMC algorithms Gersende FORT / Eric MOULINES Telecom Paris

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Optimizing multiplications with vector instructions Chitchanok Chuengsatiansup INRIA and ENS de

Lecture 2 Combinational Logic Circuits Reference: Roth/John Text: Chapter 2 1 Combinational

Karnaugh-Maps September 14, 2006 Typeset by Foil T EX What are Karnaugh Maps? A simpler

CSEE 3827: Fundamentals of Computer Systems Lecture 4 & 5 February 2 & 4, 2009 Martha

Minimizing Markov chains Beyond Bisimilarity* Giovanni Bacci, Giorgio Bacci, Kim G. Larsen , Radu

Lecture 7 Logistics HW2 due Wednesday --- Friday? Lab3 this week Lab3 this week

Software Security Lucas Cordeiro Department of Computer Science lucas.cordeiro@manchester.ac.uk

Seiberg-Witten Theory and AGT Relation Tohru Eguchi We consider N = 2 supersymmetric gauge

MCMC based machine learning a . (Bayesian Model Averaging) Nicos - PowerPoint PPT Presentation

MCMC based machine learning a . (Bayesian Model Averaging) Nicos Angelopoulos n.angelopoulos@ed.ac.uk School of Biological Sciences Biochemistry Group University of Edinburgh, Scotland, UK. a Collaborative work with James Cussens, York

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

Testing MCMC Samplers Jason M.T. Roos First European Bayesian Summit in Marketing Testing MCMC

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

Network determination based on birth-death MCMC inference A. Mohammadi and E. Wit February 4,

MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1

MCMC for Cut Models or Chasing a Moving Target with MCMC Martyn Plummer International Agency

Modern Computational Statistics Lecture 8: Advanced MCMC Cheng Zhang School of Mathematical

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Introduction to MCMC and BUGS Basic recipes, and a sample of some techniques for getting

FOR MCMC OLD HEADQUARTER CONFIDENTIAL BACKGROUND Existing MCMC Old HQ building is occupying

STAT 339 Markov Chain Monte Carlo (MCMC) 7 April 2017 Some theory and intuition about MCMC

Parallel tempering and Interacting MCMC algorithms Gersende FORT / Eric MOULINES Telecom Paris

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Optimizing multiplications with vector instructions Chitchanok Chuengsatiansup INRIA and ENS de

Lecture 2 Combinational Logic Circuits Reference: Roth/John Text: Chapter 2 1 Combinational

Karnaugh-Maps September 14, 2006 Typeset by Foil T EX What are Karnaugh Maps? A simpler

CSEE 3827: Fundamentals of Computer Systems Lecture 4 &amp; 5 February 2 &amp; 4, 2009 Martha

Minimizing Markov chains Beyond Bisimilarity* Giovanni Bacci, Giorgio Bacci, Kim G. Larsen , Radu

Lecture 7 Logistics HW2 due Wednesday --- Friday? Lab3 this week Lab3 this week

Software Security Lucas Cordeiro Department of Computer Science lucas.cordeiro@manchester.ac.uk

Seiberg-Witten Theory and AGT Relation Tohru Eguchi We consider N = 2 supersymmetric gauge

CSEE 3827: Fundamentals of Computer Systems Lecture 4 & 5 February 2 & 4, 2009 Martha