Bayesian logistic regression Already covered in lectures on - PowerPoint PPT Presentation

Deterministic Approximations Bayesian logistic regression Already covered in lectures on classification Laplace and variational approximations I will review Murphy pp256–259 on the board. Similar material by MacKay, Ch. 41, pp492–503. ( § 41.4 uses non-examinable MCMC methods) http://www.inference.phy.cam.ac.uk/mackay/itila/book.html Iain Murray http://iainmurray.net/ Posterior distributions Non-Gaussian example p ( θ |D , M ) = P ( D| θ ) p ( θ ) p ( w ) ∝ N ( w ; 0 , 1) P ( D|M ) p ( w | D ) ∝ N ( w ; 0 , 1) σ (10 − 20 w ) E.g., logistic regression: p ( θ = w ) = N ( w ; 0 , σ 2 I ) � labels z ( n ) ∈ ± 1 σ ( z ( n ) w ⊤ x ( n ) ) , P ( D| θ = w ) = n Integrate large product non-linear functions. Goals: summarize posterior in simple form, −4 −2 0 2 4 estimate model evidence P ( D|M )

Posterior after 500 datapoints Gaussian approximations N =500 labels generated with w =1 at x ( n ) ∼ N (0 , 10 2 ) Finite parameter vector θ p ( w ) ∝ N ( w ; 0 , 1) 500 � σ ( wx ( n ) z ( n ) ) P ( θ | lots of data ) often nearly Gaussian around the mode p ( w | D ) ∝ N ( w ; 0 , 1) n =1 Need to identify which Gaussian it is: mean, covariance −4 −2 0 2 4 −4 −2 0 2 4 Gaussian fit overlaid Laplace Approximation Laplace details MAP estimate: Matrix of second derivatives is called the Hessian: θ ∗ = arg max � � log P ( D| θ ) + log P ( θ ) . �� ∂ 2 θ � � H ij = − log P ( θ |D ) � ∂θ i ∂θ j � Define ‘energy’: θ = θ ∗ Find posterior mode (MAP estimate) θ ∗ using favourite E ( θ ) = − log P ( θ |D ) = − log P ( D| θ ) − log P ( θ ) + log P ( D ) . gradient-based optimizer. Because ∇ θ E is zero at θ ∗ (a turning point), Taylor expansion: E ( θ ∗ + δ ) ≈ E ( θ ∗ ) + 1 2 δ ⊤ H δ Log posterior doesn’t need to be normalized: constants disappear from derivatives and second-derivatives Do same thing to Gaussian around mean, identify Laplace approximation: P ( θ |D ) ≈ N ( θ ; θ ∗ , H − 1 )

Laplace picture Laplace problems Weird densities won’t work well. We only locally match one mode. Mode may not have much mass, or misleading curvature Curvature and mode match. High dimensions: mode may be flat in some direction We can normalize Gaussian. Height at mode won’t match exactly! → ill-conditioned Hessian Used to approximate model likelihood (AKA ‘evidence’, ‘marginal likelihood’): ≈ P ( D| θ ∗ ) P ( θ ∗ ) P ( D ) = P ( D| θ ) P ( θ ) 1 N ( θ ∗ ; θ ∗ , H − 1 ) = P ( D| θ ∗ ) P ( θ ∗ ) | 2 πH − 1 | 2 P ( θ |D ) Other Gaussian approximations Variational methods Can match a Gaussian in other ways that derivatives Goal: fit target distribution (e.g., parameter posterior) Define: — family of possible distributions q ( θ ) — ‘variational objective’ (says ‘how well does q match?’) Accurate approximation with Gaussian may not be possible Capturing posterior width better than only fitting point estimate Optimize objective: Fit parameters of q ( θ ) — e.g., mean and cov of Gaussian

Kullback–Leibler Divergence Minimizing D KL ( p || q ) � p ( θ ) log p ( θ ) Select family: q ( θ ) = N ( θ ; µ, Σ) , D KL ( p || q ) = q ( θ ) d θ Minimize D KL ( p || q ) : match mean and cov of p . D KL ( p || q ) ≥ 0 . Minimized by p ( θ ) = q ( θ ) . Information theory (non-examinable for MLPR): KL divergence: average storage wasted by compression system using model q instead of true distribution p . −4 −2 0 2 4 Minimizing D KL ( p || q ) Considering D KL ( q || p ) Optimizing D KL ( p || q ) tends to be hard. Even Gaussian q : mean and cov of p ? MCMC? Answer may not be what you want: Murphy Fig 21.1 � � D KL ( q || p ) = − q ( θ ) log p ( θ |D ) d θ + q ( θ ) log q ( θ ) d θ � �� neg. entropy, − H ( q ) 1. “Don’t put probability mass on implausible parameters” 2. Want to be spread out, high entropy. H is the standard symbol for entropy. Nothing to do with a Hessian, also H ; sorry!

Usual variational methods D KL ( q || p ) : fitting posterior Fit q to p ( θ |D ) = p ( D| θ ) p ( θ ) Most variational methods in Machine Learning p ( D ) minimize D KL ( q || p ) Substitute into KL divergence and get spray of terms: — All parameters are plausible. — We know how to do it! D KL ( q || p ) = E q [log q ( θ )] − E q [log p ( D| θ )] − E q [log p ( θ )] + log p ( D ) (There are other variational principles.) First three terms: Minimize sum of these, J ( q ) . log p ( D ) : Model evidence. Usually intractable, but: D KL ( q || p ) ≥ 0 ⇒ log p ( D ) ≥ − J ( q ) We optimize lower bound on the log marginal likelihood D KL ( q || p ) : optimization Summary Laplace approximation: Literature full of clever (non-examinable) iterative ways to — Straightforward to apply optimize D KL ( q || p ) . q not always Gaussian. — 2nd derivatives → certainty of parameter Use standard optimizers? Hardest term to evaluate is: — Incremental improvement on MAP estimate N � E q [log p ( D| θ )] = E q [log p ( x n | θ )] Variational methods: n =1 — Fit variational parameters of q (not θ !) Sum of possibly simple integrals. — Usually KL ( q || p ) , compare to KL ( p || q ) Stochastic gradient descent is an option. — Bound marginal/model likelihood (‘the evidence’)

Bayesian logistic regression Already covered in lectures on - PowerPoint PPT Presentation

Deterministic Approximations Bayesian logistic regression Already covered in lectures on classification Laplace and variational approximations I will review Murphy pp256259 on the board. Similar material by MacKay, Ch. 41, pp492503. (

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

Logistic Regression Two Worlds: Probabilistic & Algorithmic We know two conceptual approaches

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Optimal scaling and convergence of Markov chain Monte Carlo methods Alain Durmus Joint work

Logistic Regression Required reading: Mitchell draft chapter (see course website)

Perceptron and Logistic Regression Milan Straka October 19, 2020 Charles University in Prague

High-dimensional classification by sparse logistic regression Felix Abramovich Tel Aviv

CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a

Model selection and parameter estimation with covariates in logistic regression missing

Sambuz

Useful Links

Newsletter

Mail Us

Bayesian logistic regression Already covered in lectures on - PowerPoint PPT Presentation

Deterministic Approximations Bayesian logistic regression Already covered in lectures on classification Laplace and variational approximations I will review Murphy pp256259 on the board. Similar material by MacKay, Ch. 41, pp492503. (

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

Logistic Regression Two Worlds: Probabilistic &amp; Algorithmic We know two conceptual approaches

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Optimal scaling and convergence of Markov chain Monte Carlo methods Alain Durmus Joint work

Logistic Regression Required reading: Mitchell draft chapter (see course website)

Perceptron and Logistic Regression Milan Straka October 19, 2020 Charles University in Prague

High-dimensional classification by sparse logistic regression Felix Abramovich Tel Aviv

CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a

Model selection and parameter estimation with covariates in logistic regression missing

Sambuz

Useful Links

Newsletter

Mail Us

Logistic Regression Two Worlds: Probabilistic & Algorithmic We know two conceptual approaches