Probability Distributions Sargur N. Srihari 1 Srihari Machine - PowerPoint PPT Presentation

Srihari Machine Learning Probability Distributions Sargur N. Srihari 1

Srihari Machine Learning Distributions: Landscape Discrete- Binary Bernoulli Binomial Beta Discrete- Multivalued Multinomial Dirichlet Continuous Gaussian Student’s-t Gamma Wishart Exponential Angular Von Mises Uniform 2

Srihari Machine Learning Distributions: Relationships Discrete- Conjugate Binary N=1 Beta Prior Binomial Bernoulli Continuous variable N samples of Bernoulli Single binary variable between {0,1] K=2 Discrete- Multinomial Conjugate Prior Large Dirichlet Multi-valued One of K values = N K -dimensional K random variables binary vector between [0.1] Continuous Student’s-t Gamma Wishart Exponential Generalization of ConjugatePrior of univariate Conjugate Prior of multivariate Special case of Gamma Gaussian robust to Gaussian precision Gaussian precision matrix Gaussian Outliers Infinite mixture of Gaussian-Gamma Gaussian-Wishart Gaussians Conjugate prior of univariate Gaussian Conjugate prior of multi-variate Gaussian Unknown mean and precision Unknown mean and precision matrix Angular Von Mises Uniform 3

Srihari Machine Learning Binary Variables Bernoulli, Binomial and Beta 4

Srihari Machine Learning Bernoulli Distribution • Expresses distribution of Single binary-valued random variable x ε {0,1} • Probability of x=1 is denoted by parameter µ , i.e., p(x=1| µ )= µ • Therefore p(x=0| µ )=1- µ • Probability distribution has the form Bern(x| µ )= µ x (1- µ ) 1-x • Mean is shown to be E[x]= µ • Variance is Var[x]= µ (1- µ ) Jacob Bernoulli • Likelihood of n observations independently drawn from p(x| µ ) is 1654-1705 • Log-likelihood is • Maximum likelihood estimator – obtained by setting derivative of ln p(D| µ ) wrt µ equal to zero is • If no of observations of x=1 is m then µ ML =m/N 5

Srihari Machine Learning Binomial Distribution • Related to Bernoulli distribution • Expresses Distribution of m – No of observations for which x=1 • It is proportional to Bern(x | µ ) Histogram of Binomial for • Add up all ways of obtaining heads N=10 and µ =0.25 • Mean and Variance are 6

Srihari Machine Learning Beta Distribution • Beta distribution a=0.1, b=0.1 a=1, b=1 • Where the Gamma function is defined as a=2, b=3 a=8, b=4 • a and b are hyperparameters that control distribution of parameter µ • Mean and Variance Beta distribution as function of µ For values of hyperparameters a and b 7

Srihari Machine Learning Bayesian Inference with Beta • MLE of µ in Bernoulli is fraction of observations with x=1 – Severely over-fitted for small data sets • Likelihood function takes products of factors of the form µ x (1- µ ) (1-x) • If prior distribution of µ is chosen to be proportional to powers of µ and 1- µ , posterior will have same functional form as the prior – Called conjugacy • Beta has form suitable for a prior distribution of p( µ ) 8

Srihari Machine Learning Bayesian Inference with Beta Illustration of one step in process • Posterior obtained by multiplying beta a=2, b=2 prior with binomial likelihood yields N=m=1, with x=1 – where l=N-m, which is no of tails – m is no of heads µ 1 ( 1 - µ ) 0 • It is another beta distribution a=3, b=2 – Effectively increase value of a by m and b by l – As number of observations increases distribution becomes more peaked 9

Srihari Machine Learning Predicting next trial outcome • Need predictive distribution of x given observed D – From sum and products rule 1 1 ∫ ∫ p ( x = 1| D ) = p ( x = 1, µ | D ) d µ p ( x = 1| µ ) p ( µ | D ) d µ = = 0 0 1 = ∫ µ p ( µ | D ) d µ = E [ µ | D ] 0 • Expected value of the posterior distribution can be shown to be – Which is fraction of observations (both fictitious and real) that correspond to x=1 • Maximum likelihood and Bayesian results agree in the limit of infinite observations – On average uncertainty (variance) decreases with observed data 10

Srihari Machine Learning Summary • Single Binary variable distribution is represented by Bernoulli • Binomial is related to Bernoulli – Expresses distribution of number of occurrences of either 1 or 0 in N trials • Beta distribution is a conjugate prior for Bernoulli – Both have the same functional form 11

Srihari Machine Learning Multinomial Variables Generalized Bernoulli and Dirichlet 12

Srihari Machine Learning Generalization of Bernoulli • Discrete variable that takes one of K values (instead of 2 ) • Represent as 1 of K scheme – Represent x as a K -dimensional vector – If x=3 then we represent it as x =(0,0,1,0,0,0) T – Such vectors satisfy • If probability of x k =1 is denoted µ k then distribution of x is given by Generalized Bernoulli 13

Srihari Machine Learning Likelihood Function • Given a set of D of N independent observations x 1 ,..x N • The likelihood function has the form • Where m k = Σ n x nk is the number of observations of x k =1 • The maximum likelihood solution (obtained by log-likelihood and derivative wrt zero) is which is fraction of N observations for which x k =1 14

Srihari Machine Learning Generalized Binomial Distribution • Multinomial distribution • Where the normalization coefficient is the no of ways of partitioning N objects into K groups of size • Given by 15

Srihari Machine Learning Dirichlet Distribution Lejeune Dirichlet 1805-1859 • Family of prior distributions for parameters µ k of multinomial distribution • By inspection of multinomial, form of conjugate prior is • Normalized form of Dirichlet distribution 16

Srihari Machine Learning Dirichlet over 3 variables α k =0.1 • Due to summation constraint – Distribution over Plots of Dirichlet α k =1 space of { µ k } is distribution over the simplex for various confined to the settings of parameters simplex of α k dimensionality K-1 α k =10 – For K=3 17

Srihari Machine Learning Dirichlet Posterior Distribution • Multiplying prior by likelihood • Which has the form of the Dirichlet distribution 18

Srihari Machine Learning Summary • Multinomial is a generalization of Bernoulli – Variable takes on one of K values instead of 2 • Conjugate prior of Multinomial is Dirichlet distribution 19

Probability Distributions Sargur N. Srihari 1 Srihari Machine - PowerPoint PPT Presentation

Srihari Machine Learning Probability Distributions Sargur N. Srihari 1 Srihari Machine Learning Distributions: Landscape Discrete- Binary Bernoulli Binomial Beta Discrete- Multivalued Multinomial Dirichlet Continuous Gaussian

Common Probability Distributions Several simple probability distributions are useful in may

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Outline Discrete Probability Distributions The Binomial Distribution (3.1) The

Unit 2: Probability and distributions 1. Probability and conditional probability GOVT 3990 -

Unit 2: Probability and distributions 3. Normal and binomial distributions GOVT 3990 - Spring

Outline Continuous Probability Distributions The Uniform Distribution (4.1) ( ) The

Outline 1. Bayes Law L7: Probability Basics 2. Probability distributions CS 344R/393R:

Unit 2: Probability and distributions 3. Normal and binomial distributions PS: Explain your

Input Distributions Reading: Chapter 6 in Law Input Distributions Overview Probability Theory

Probability Distributions. Conditional Probability Russell Impagliazzo and Miles Jones Thanks to

AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic

Lecture 6 : Discrete Random Variables and Probability Distributions 0/ 31 Go to BACKGROUND

Lecture 6 : Discrete Random Variables and Probability Distributions 0/ 32 Go to BACKGROUND

Unit 2: Probability and distributions 3. Normal and binomial distributions Sta 101 - Spring 2019

Statistics for Business Random Variables and Probability Distributions, Special Discrete and

Gov 2000: 2. Random Variables and Probability Distributions Matthew Blackwell Fall 2016 1 / 56

Chapter 5 Continuous Random Variables Continuous Probability Distributions Continuous Probability

Machine Learning for Computational Linguistics Some probability distributions .

Why Some Families of Probability Which Constraints Are . . . Distributions Are Practically

Outline IAML: Basic Probability and Estimation Random Variables Discrete distributions

Lecture 2: Probability and Distributions Ani Manichaikul amanicha@jhsph.edu 17 April 2007 1 /

Continuous Probability, RVs, Distributions EECS 126 Fall 2019 September 17, 2019 Agenda

Chapter 5 Slide 1 Normal Probability Distributions 5-1 Overview 5-2 The Standard Normal