applied machine learning
play

APPLIED MACHINE LEARNING Probability Density Functions Gaussian - PowerPoint PPT Presentation

APPLIED MACHINE LEARNING APPLIED MACHINE LEARNING Probability Density Functions Gaussian Mixture Models 1 APPLIED MACHINE LEARNING Discrete Probabilities Consider two variables x and y taking discrete values over the intervals [1.....N ] and


  1. APPLIED MACHINE LEARNING APPLIED MACHINE LEARNING Probability Density Functions Gaussian Mixture Models 1

  2. APPLIED MACHINE LEARNING Discrete Probabilities Consider two variables x and y taking discrete values over the intervals [1.....N ] and [1.....N ] respectively. x y    : the probability that the variable takes value . P x i x i        0 1, 1,..., , P x i i N x N  x     and 1. P x i  i 1     Idem for , 1,... P y j j N y 2

  3. APPLIED MACHINE LEARNING Discrete Probabilities The joint probability is written p(x,y). The joint probability that variable x takes value i and variable y takes value j is:                  , or P x i y j P x i y j P(x | y) is the conditional probability of observing a value for x given a value for y.     | ( ) P y x P x , P x y       | | P x y P x y     P y P y Bayes' theorem: When x and y are statistically independent: Matlab Exercise I          | ( ), | ( ) and , ( ) ( ). P x y P x P y x P y P x y P x P y 3

  4. APPLIED MACHINE LEARNING Discrete Probabilities The marginal probability that variable x takes value x i is given by: N y      ( ): ( , ) P x x P x i y j x i xy  1 j Drop the x, y for simplicity of notation • To compute the marginal, one needs the joint distribution p(x,y). • Often, one does not know it and one can only estimate it. • If x is a multidimensional variable  the marginal is a joint distribution! 4

  5. APPLIED MACHINE LEARNING Joint Distribution and Curse of Dimensionality The joint distribution is far richer than the marginals. The marginals of N variables taking K values corresponds to N(K-1) probabilities. The joint distribution corresponds to ~N K probabilities. Pros of computing the joint distribution: Provides statistical dependencies across all variables and the marginal distributions Cons: Computational costs grow exponentially with number of dimensions (statistical power: 10 samples to estimate each parameter of a model)  Compute solely the conditional if you care only about dependencies across variables (this will be relevant for lecture on non-linear regression methods) 5

  6. APPLIED MACHINE LEARNING Probability Distributions, Density Functions p(x) a continuous function is the probability density function or probability distribution function (PDF) (sometimes also called probability distribution or simply density) of variable x .    ( ) 0, p x x    ( ) 1 p x dx  6

  7. APPLIED MACHINE LEARNING Probability Distributions, Density Functions The pdf is not bounded by 1. It can grow unbounded, depending on the value taken by x. p(x) x 7

  8. APPLIED MACHINE LEARNING PDF equivalency with Discrete Probability The cumulative distribution function (or simply distribution function) of X is:       * * D x P x x x   *  x   * ( ) , D x p x dx x x  p ( x ) d x ~ probability of x to fall within an infinitesimal interval [ x , x + d x ] 8

  9. APPLIED MACHINE LEARNING PDF equivalency with Discrete Probability Uniform distribution on x p(x) Probability that x takes a value x in the subinterval [a,b] is given by:  b     ( ) : ( ) ( ) P x b D x b p x dx x          * ( ) ( ) ( ) D x P a x b D x b D x a x x x  b     ( ) ( ) 1 P a x b p x dx a * 9 x

  10. APPLIED MACHINE LEARNING Expectation The expectation of the random variable x with probability P(x) (in the discrete case) and pdf p(x) (in the continuous case), also called the expected value or mean, is the mean of the observed value of x weighted by p(x). If X is the set of observations of x, then:       When x takes discrete values: ( ) E x xP x  x X         For continuous distributions: ( ) E x x p x dx X 10

  11. APPLIED MACHINE LEARNING Variance  , the variance of a distribution measures the amount of spread of the 2 distribution around its mean:         2 2           2 2 ( ) Var x E x E x E x   is the standard deviation of x. 11

  12. APPLIED MACHINE LEARNING Parametric PDF The uni-dimensional Gaussian or Normal distribution is a distribution with pdf given by:     2   x   1      2  2   , μ:mean, σ :variance 2 p x e   2 The Gaussian function is entirely determined by its mean and variance. For this reason, it is referred to as a parametric distribution. 12 Illustrations from Wikipedia

  13. APPLIED MACHINE LEARNING Mean and Variance in PDF ~68% of the data are comprised between +/ 1 sigma ~96% of the data are comprised between +/ 2 sigma-s ~99% of the data are comprised between +/ 3 sigma-s This is no longer true for arbitrary pdf-s! 13 Illustrations from Wikipedia

  14. APPLIED MACHINE LEARNING Mean and Variance in PDF 0.7 0.6 1sigma=0.68 0.5 f=1/3(f1+f2+f3) 0.4 0.3 0.2 0.1 Expectation: 0 -4 -3 -2 -1 0 1 2 3 4 x Resulting distribution when superposing the 3 Gaussians distributions 3 Gaussian distributions. For other pdf than the Gaussian distribution, the variance represents a notion of dispersion around the expected value. Matlab Demo I 14

  15. APPLIED MACHINE LEARNING Multi-dimensional Gaussian Function The uni-dimensional Gaussian or Normal distribution is a distribution with pdf given by:     2   x    1      2    2   , μ:mean, σ:variance ; , p x e   2 The multi-dimensional Gaussian or Normal distribution has a pdf given by:   1        T    1 1  x x         2 ; , p x e 1 N      2 2 2 if x is N-dimensional, then  μ is a dimensional mean vector N   is a covariance matrix N N 15

  16. APPLIED MACHINE LEARNING 2-dimensional Gaussian Pdf   , p x x 1 2 x 2 x x 2 1 x 1   1        T    1 1  x x         2 ; , p x e 1   N       Isolines: p x cst 2 2 2 if x is N-dimensional, then  μ is a dimensional mean vector N   is a covariance matrix N N 16

  17. APPLIED MACHINE LEARNING Modeling Data with a Gaussian Function    1... i M  i Construct covariance matrix from (centered) set of datapoints : X x 1   T XX M   1        T    1 1  x x         2 ; , p x e 1 N      2 2 2 if x is N-dimensional, then  μ is a dimensional mean vector N   is a covariance matrix N N 17

  18. APPLIED MACHINE LEARNING Modeling Data with a Gaussian Function    1... i M  i Construct covariance matrix from (centered) set of datapoints : X x 1   T XX M  is square and symmetric. It can be decomposed using the eigenvalue decomposition.    T , V V    0    1 : matrix of eigenvectors, : diagonal matrix composed of eigenvalues V   .......     0   N For the 1-std ellipse, the axes' lengths are 1 st eigenvector equal to:    0     1 T x and , with   . V V 2  1 2  0  2nd eigenvector 2 Each isoline corresponds to a scaling of the 1std ellipse. x 18 1

  19. APPLIED MACHINE LEARNING Fitting a single Gauss function and PCA PCA Identifies a suitable representation of a multivariate data set by decorrelating the dataset.       1 1 2   When projected onto e and e , the set of T X   2 ~ ; , p e X N   2 2 2   datapoints appears to follow two uncorrelated Normal distributions. 2 e 1 st eigenvector       1   T X   1 ~ ; , p e X N   2 1 1   x 2 2nd eigenvector 1 e x 19 1

  20. APPLIED MACHINE LEARNING Marginal, Conditional in Pdf Consider two random variables x 1 and x 2 with joint distribution p(x 1 , x 2 ), then the marginal probability of x 1 given x 1 is:     ( , ) p x p x x dx 1 1 2 2 The conditional probability is given by:   ( | ) ( , ) p x x p x p x x        1 2 2 1 2 | | p x x p x x     2 1 2 1 p x p x 1 1 20

  21. APPLIED MACHINE LEARNING Marginal, Conditional Pdf of Gauss Functions The conditional and marginal pdf of a multi-dimensional Gauss function are all Gauss functions! joint density of , x x 1 2   , p x x 1 2 marginal density of x    , 2 1 2 2 conditional density of x 2 x  given 0. 1 Matlab Exercise II 1  1 x  0 marginal density of x 1 21 Illustrations from Wikipedia 1

Recommend


More recommend