APPLIED MACHINE LEARNING APPLIED MACHINE LEARNING Probability Density Functions Gaussian Mixture Models 1
APPLIED MACHINE LEARNING Discrete Probabilities Consider two variables x and y taking discrete values over the intervals [1.....N ] and [1.....N ] respectively. x y : the probability that the variable takes value . P x i x i 0 1, 1,..., , P x i i N x N x and 1. P x i i 1 Idem for , 1,... P y j j N y 2
APPLIED MACHINE LEARNING Discrete Probabilities The joint probability is written p(x,y). The joint probability that variable x takes value i and variable y takes value j is: , or P x i y j P x i y j P(x | y) is the conditional probability of observing a value for x given a value for y. | ( ) P y x P x , P x y | | P x y P x y P y P y Bayes' theorem: When x and y are statistically independent: Matlab Exercise I | ( ), | ( ) and , ( ) ( ). P x y P x P y x P y P x y P x P y 3
APPLIED MACHINE LEARNING Discrete Probabilities The marginal probability that variable x takes value x i is given by: N y ( ): ( , ) P x x P x i y j x i xy 1 j Drop the x, y for simplicity of notation • To compute the marginal, one needs the joint distribution p(x,y). • Often, one does not know it and one can only estimate it. • If x is a multidimensional variable the marginal is a joint distribution! 4
APPLIED MACHINE LEARNING Joint Distribution and Curse of Dimensionality The joint distribution is far richer than the marginals. The marginals of N variables taking K values corresponds to N(K-1) probabilities. The joint distribution corresponds to ~N K probabilities. Pros of computing the joint distribution: Provides statistical dependencies across all variables and the marginal distributions Cons: Computational costs grow exponentially with number of dimensions (statistical power: 10 samples to estimate each parameter of a model) Compute solely the conditional if you care only about dependencies across variables (this will be relevant for lecture on non-linear regression methods) 5
APPLIED MACHINE LEARNING Probability Distributions, Density Functions p(x) a continuous function is the probability density function or probability distribution function (PDF) (sometimes also called probability distribution or simply density) of variable x . ( ) 0, p x x ( ) 1 p x dx 6
APPLIED MACHINE LEARNING Probability Distributions, Density Functions The pdf is not bounded by 1. It can grow unbounded, depending on the value taken by x. p(x) x 7
APPLIED MACHINE LEARNING PDF equivalency with Discrete Probability The cumulative distribution function (or simply distribution function) of X is: * * D x P x x x * x * ( ) , D x p x dx x x p ( x ) d x ~ probability of x to fall within an infinitesimal interval [ x , x + d x ] 8
APPLIED MACHINE LEARNING PDF equivalency with Discrete Probability Uniform distribution on x p(x) Probability that x takes a value x in the subinterval [a,b] is given by: b ( ) : ( ) ( ) P x b D x b p x dx x * ( ) ( ) ( ) D x P a x b D x b D x a x x x b ( ) ( ) 1 P a x b p x dx a * 9 x
APPLIED MACHINE LEARNING Expectation The expectation of the random variable x with probability P(x) (in the discrete case) and pdf p(x) (in the continuous case), also called the expected value or mean, is the mean of the observed value of x weighted by p(x). If X is the set of observations of x, then: When x takes discrete values: ( ) E x xP x x X For continuous distributions: ( ) E x x p x dx X 10
APPLIED MACHINE LEARNING Variance , the variance of a distribution measures the amount of spread of the 2 distribution around its mean: 2 2 2 2 ( ) Var x E x E x E x is the standard deviation of x. 11
APPLIED MACHINE LEARNING Parametric PDF The uni-dimensional Gaussian or Normal distribution is a distribution with pdf given by: 2 x 1 2 2 , μ:mean, σ :variance 2 p x e 2 The Gaussian function is entirely determined by its mean and variance. For this reason, it is referred to as a parametric distribution. 12 Illustrations from Wikipedia
APPLIED MACHINE LEARNING Mean and Variance in PDF ~68% of the data are comprised between +/ 1 sigma ~96% of the data are comprised between +/ 2 sigma-s ~99% of the data are comprised between +/ 3 sigma-s This is no longer true for arbitrary pdf-s! 13 Illustrations from Wikipedia
APPLIED MACHINE LEARNING Mean and Variance in PDF 0.7 0.6 1sigma=0.68 0.5 f=1/3(f1+f2+f3) 0.4 0.3 0.2 0.1 Expectation: 0 -4 -3 -2 -1 0 1 2 3 4 x Resulting distribution when superposing the 3 Gaussians distributions 3 Gaussian distributions. For other pdf than the Gaussian distribution, the variance represents a notion of dispersion around the expected value. Matlab Demo I 14
APPLIED MACHINE LEARNING Multi-dimensional Gaussian Function The uni-dimensional Gaussian or Normal distribution is a distribution with pdf given by: 2 x 1 2 2 , μ:mean, σ:variance ; , p x e 2 The multi-dimensional Gaussian or Normal distribution has a pdf given by: 1 T 1 1 x x 2 ; , p x e 1 N 2 2 2 if x is N-dimensional, then μ is a dimensional mean vector N is a covariance matrix N N 15
APPLIED MACHINE LEARNING 2-dimensional Gaussian Pdf , p x x 1 2 x 2 x x 2 1 x 1 1 T 1 1 x x 2 ; , p x e 1 N Isolines: p x cst 2 2 2 if x is N-dimensional, then μ is a dimensional mean vector N is a covariance matrix N N 16
APPLIED MACHINE LEARNING Modeling Data with a Gaussian Function 1... i M i Construct covariance matrix from (centered) set of datapoints : X x 1 T XX M 1 T 1 1 x x 2 ; , p x e 1 N 2 2 2 if x is N-dimensional, then μ is a dimensional mean vector N is a covariance matrix N N 17
APPLIED MACHINE LEARNING Modeling Data with a Gaussian Function 1... i M i Construct covariance matrix from (centered) set of datapoints : X x 1 T XX M is square and symmetric. It can be decomposed using the eigenvalue decomposition. T , V V 0 1 : matrix of eigenvectors, : diagonal matrix composed of eigenvalues V ....... 0 N For the 1-std ellipse, the axes' lengths are 1 st eigenvector equal to: 0 1 T x and , with . V V 2 1 2 0 2nd eigenvector 2 Each isoline corresponds to a scaling of the 1std ellipse. x 18 1
APPLIED MACHINE LEARNING Fitting a single Gauss function and PCA PCA Identifies a suitable representation of a multivariate data set by decorrelating the dataset. 1 1 2 When projected onto e and e , the set of T X 2 ~ ; , p e X N 2 2 2 datapoints appears to follow two uncorrelated Normal distributions. 2 e 1 st eigenvector 1 T X 1 ~ ; , p e X N 2 1 1 x 2 2nd eigenvector 1 e x 19 1
APPLIED MACHINE LEARNING Marginal, Conditional in Pdf Consider two random variables x 1 and x 2 with joint distribution p(x 1 , x 2 ), then the marginal probability of x 1 given x 1 is: ( , ) p x p x x dx 1 1 2 2 The conditional probability is given by: ( | ) ( , ) p x x p x p x x 1 2 2 1 2 | | p x x p x x 2 1 2 1 p x p x 1 1 20
APPLIED MACHINE LEARNING Marginal, Conditional Pdf of Gauss Functions The conditional and marginal pdf of a multi-dimensional Gauss function are all Gauss functions! joint density of , x x 1 2 , p x x 1 2 marginal density of x , 2 1 2 2 conditional density of x 2 x given 0. 1 Matlab Exercise II 1 1 x 0 marginal density of x 1 21 Illustrations from Wikipedia 1
Recommend
More recommend