Srihari Machine Learning Probability Distributions Sargur N. Srihari 1
Srihari Machine Learning Distributions: Landscape Discrete- Binary Bernoulli Binomial Beta Discrete- Multivalued Multinomial Dirichlet Continuous Gaussian Student’s-t Gamma Wishart Exponential Angular Von Mises Uniform 2
Srihari Machine Learning Distributions: Relationships Discrete- Conjugate Binary N=1 Beta Prior Binomial Bernoulli Continuous variable N samples of Bernoulli Single binary variable between {0,1] K=2 Discrete- Multinomial Conjugate Prior Large Dirichlet Multi-valued One of K values = N K -dimensional K random variables binary vector between [0.1] Continuous Student’s-t Gamma Wishart Exponential Generalization of ConjugatePrior of univariate Conjugate Prior of multivariate Special case of Gamma Gaussian robust to Gaussian precision Gaussian precision matrix Gaussian Outliers Infinite mixture of Gaussian-Gamma Gaussian-Wishart Gaussians Conjugate prior of univariate Gaussian Conjugate prior of multi-variate Gaussian Unknown mean and precision Unknown mean and precision matrix Angular Von Mises Uniform 3
Srihari Machine Learning Binary Variables Bernoulli, Binomial and Beta 4
Srihari Machine Learning Bernoulli Distribution • Expresses distribution of Single binary-valued random variable x ε {0,1} • Probability of x=1 is denoted by parameter µ , i.e., p(x=1| µ )= µ • Therefore p(x=0| µ )=1- µ • Probability distribution has the form Bern(x| µ )= µ x (1- µ ) 1-x • Mean is shown to be E[x]= µ • Variance is Var[x]= µ (1- µ ) Jacob Bernoulli • Likelihood of n observations independently drawn from p(x| µ ) is 1654-1705 • Log-likelihood is • Maximum likelihood estimator – obtained by setting derivative of ln p(D| µ ) wrt µ equal to zero is • If no of observations of x=1 is m then µ ML =m/N 5
Srihari Machine Learning Binomial Distribution • Related to Bernoulli distribution • Expresses Distribution of m – No of observations for which x=1 • It is proportional to Bern(x | µ ) Histogram of Binomial for • Add up all ways of obtaining heads N=10 and µ =0.25 • Mean and Variance are 6
Srihari Machine Learning Beta Distribution • Beta distribution a=0.1, b=0.1 a=1, b=1 • Where the Gamma function is defined as a=2, b=3 a=8, b=4 • a and b are hyperparameters that control distribution of parameter µ • Mean and Variance Beta distribution as function of µ For values of hyperparameters a and b 7
Srihari Machine Learning Bayesian Inference with Beta • MLE of µ in Bernoulli is fraction of observations with x=1 – Severely over-fitted for small data sets • Likelihood function takes products of factors of the form µ x (1- µ ) (1-x) • If prior distribution of µ is chosen to be proportional to powers of µ and 1- µ , posterior will have same functional form as the prior – Called conjugacy • Beta has form suitable for a prior distribution of p( µ ) 8
Srihari Machine Learning Bayesian Inference with Beta Illustration of one step in process • Posterior obtained by multiplying beta a=2, b=2 prior with binomial likelihood yields N=m=1, with x=1 – where l=N-m, which is no of tails – m is no of heads µ 1 ( 1 - µ ) 0 • It is another beta distribution a=3, b=2 – Effectively increase value of a by m and b by l – As number of observations increases distribution becomes more peaked 9
Srihari Machine Learning Predicting next trial outcome • Need predictive distribution of x given observed D – From sum and products rule 1 1 ∫ ∫ p ( x = 1| D ) = p ( x = 1, µ | D ) d µ p ( x = 1| µ ) p ( µ | D ) d µ = = 0 0 1 = ∫ µ p ( µ | D ) d µ = E [ µ | D ] 0 • Expected value of the posterior distribution can be shown to be – Which is fraction of observations (both fictitious and real) that correspond to x=1 • Maximum likelihood and Bayesian results agree in the limit of infinite observations – On average uncertainty (variance) decreases with observed data 10
Srihari Machine Learning Summary • Single Binary variable distribution is represented by Bernoulli • Binomial is related to Bernoulli – Expresses distribution of number of occurrences of either 1 or 0 in N trials • Beta distribution is a conjugate prior for Bernoulli – Both have the same functional form 11
Srihari Machine Learning Multinomial Variables Generalized Bernoulli and Dirichlet 12
Srihari Machine Learning Generalization of Bernoulli • Discrete variable that takes one of K values (instead of 2 ) • Represent as 1 of K scheme – Represent x as a K -dimensional vector – If x=3 then we represent it as x =(0,0,1,0,0,0) T – Such vectors satisfy • If probability of x k =1 is denoted µ k then distribution of x is given by Generalized Bernoulli 13
Srihari Machine Learning Likelihood Function • Given a set of D of N independent observations x 1 ,..x N • The likelihood function has the form • Where m k = Σ n x nk is the number of observations of x k =1 • The maximum likelihood solution (obtained by log-likelihood and derivative wrt zero) is which is fraction of N observations for which x k =1 14
Srihari Machine Learning Generalized Binomial Distribution • Multinomial distribution • Where the normalization coefficient is the no of ways of partitioning N objects into K groups of size • Given by 15
Srihari Machine Learning Dirichlet Distribution Lejeune Dirichlet 1805-1859 • Family of prior distributions for parameters µ k of multinomial distribution • By inspection of multinomial, form of conjugate prior is • Normalized form of Dirichlet distribution 16
Srihari Machine Learning Dirichlet over 3 variables α k =0.1 • Due to summation constraint – Distribution over Plots of Dirichlet α k =1 space of { µ k } is distribution over the simplex for various confined to the settings of parameters simplex of α k dimensionality K-1 α k =10 – For K=3 17
Srihari Machine Learning Dirichlet Posterior Distribution • Multiplying prior by likelihood • Which has the form of the Dirichlet distribution 18
Srihari Machine Learning Summary • Multinomial is a generalization of Bernoulli – Variable takes on one of K values instead of 2 • Conjugate prior of Multinomial is Dirichlet distribution 19
Recommend
More recommend