Fundamentals of bayesian statistics . Course of Machine Learning - PowerPoint PPT Presentation

Fundamentals of bayesian statistics . Course of Machine Learning Master Degree in Computer Science University of Rome ``Tor Vergata'' Giorgio Gambosi a.a. 2018-2019 1

Bayesian statistics Classical (frequentist) statistics • Interpretation of probability as frequence of an event over a sufficiently long sequence of reproducible experiments. • Parameters seen as constants to determine Bayesian statistics • Interpretation of probability as degree of belief that an event may occur. • Parameters seen as random variables 2

Bayes' rule Cornerstone of bayesian statistics is Bayes' rule 3 p ( X = x | Θ = θ ) = p (Θ = θ | X = x ) p ( X = x ) p (Θ = θ ) Given two random variables X, Θ , it relates the conditional probabilities p ( X = x | Θ = θ ) and p (Θ = θ | X = x ) .

that Bayesian inference 4 Given an observed dataset X and a family of probability distributions p ( x | Θ) with parameter Θ (a probabilistic model), we wish to find the parameter value which best allows to describe X through the model. In the bayesian framework, we deal with the distribution probability p (Θ) of the parameter Θ considered here as a random variable. Bayes' rule states p (Θ | X ) = p ( X | Θ) p (Θ) p ( X )

Bayesian inference Interpretation (a.k.a. prior distribution) (a.k.a. posterior distribution) 5 • p (Θ) stands as the knowledge available about Θ before X is observed • p (Θ | X ) stands as the knowledge available about Θ after X is observed • p ( X | Θ) measures how much the observed data are coherent to the model, assuming a certain value Θ of the parameter (a.k.a. likelihood) • p ( X ) = ∑ Θ ′ p ( X | Θ ′ ) p (Θ ′ ) is the probability that X is observed, considered as a mean w.r.t. all possible values of Θ (a.k.a. evidence)

Conjugate distributions Definition Consequence expressed as the old one. 6 Given a likelihood function p ( y | x ) , a (prior) distribution p ( x ) is conjugate to p ( y | x ) if the posterior distribution p ( x | y ) is of the same type as p ( x ) . If we look at p ( x ) as our knowledge of the random variable x before knowing y and with p ( x | y ) our knowledge once y is known, the new knowledge can be

Examples of conjugate distributions: beta-bernoulli then The Beta distribution is conjugate to the Bernoulli distribution. In fact, given 7 x ∈ [0 , 1] and y ∈ { 0 , 1 } , if p ( φ | α, β ) = Beta ( φ | α, β ) = Γ( α + β ) Γ( α )Γ( β ) φ α − 1 (1 − φ ) β − 1 p ( x | φ ) = φ x (1 − φ ) 1 − x p ( φ | x )= 1 Z φ α − 1 (1 − φ ) β − 1 φ x (1 − φ ) 1 − x = Beta ( x | α + x − 1 , β − x ) where Z is the normalization coefficient ∫ 1 Γ( α + β + 1) φ α + x − 1 (1 − φ ) β − x dφ = Z = Γ( α + x )Γ( β − x + 1) 0

Examples of conjugate distributions: beta-binomial The Beta distribution is also conjugate to the Binomial distribution. In fact, with the normalization coefficient then 8 given x ∈ [0 , 1] and y ∈ { 0 , 1 } , if p ( φ | α, β ) = Beta ( φ | α, β ) = Γ( α + β ) Γ( α )Γ( β ) φ α − 1 (1 − φ ) β − 1 ( ) N N ! φ k (1 − φ ) N − k = ( N − k )! k ! φ N (1 − φ ) N − k p ( k | φ, N ) = k p ( φ | k, N, α, β )= 1 Z φ α − 1 (1 − φ ) β − 1 φ k (1 − φ ) N − k = Beta ( φ | α + k − 1 , β + N − k − 1) ∫ 1 Γ( α + β + N ) φ α + k − 1 (1 − φ ) β + N − k − 1 dφ = Z = Γ( α + k )Γ( β + N − k ) 0

Fundamentals of bayesian statistics . Course of Machine Learning - PowerPoint PPT Presentation

Fundamentals of bayesian statistics . Course of Machine Learning Master Degree in Computer Science University of Rome ``Tor Vergata'' Giorgio Gambosi a.a. 2018-2019 1 Bayesian statistics Classical (frequentist) statistics

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Statistics for Analytical Science at Warwick Simon Spencer Bayesian statistics in epidemiology

Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric

Statistics for Applications Chapter 8: Bayesian Statistics 1/17 The Bayesian approach (1)

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian statistics DS GA 1002 Probability and Statistics for Data Science

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Efficient learning of smooth probability functions from Bernoulli tests with guarantees Paul

From Stochastic Search to Programming by Optimisation: My Quest for Automating the Design of

Portfolio-based Algorithm Selection for SAT Holger H. Hoos BETA Lab Department of Computer

2019-2020 A TE T eaching Dossier Workshop Academy of T eaching Excellence (ATE) University

The prior model Alicia Johnson Associate Professor, Macalester College DataCamp Bayesian

CSE446: Point Estjmatjon Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein,

User Popula,ons Forgo=en usernames/ Distributed across networks; LOW-RATE passwords the

Thanks to R Parr, C Guesterin