APPLIED MACHINE LEARNING Probability Density Functions Gaussian - PowerPoint PPT Presentation

APPLIED MACHINE LEARNING APPLIED MACHINE LEARNING Probability Density Functions Gaussian Mixture Models 1

APPLIED MACHINE LEARNING Discrete Probabilities Consider two variables x and y taking discrete values over the intervals [1.....N ] and [1.....N ] respectively. x y    : the probability that the variable takes value . P x i x i        0 1, 1,..., , P x i i N x N  x     and 1. P x i  i 1     Idem for , 1,... P y j j N y 2

APPLIED MACHINE LEARNING Discrete Probabilities The joint probability is written p(x,y). The joint probability that variable x takes value i and variable y takes value j is:                  , or P x i y j P x i y j P(x | y) is the conditional probability of observing a value for x given a value for y.     | ( ) P y x P x , P x y       | | P x y P x y     P y P y Bayes' theorem: When x and y are statistically independent: Matlab Exercise I          | ( ), | ( ) and , ( ) ( ). P x y P x P y x P y P x y P x P y 3

APPLIED MACHINE LEARNING Discrete Probabilities The marginal probability that variable x takes value x i is given by: N y      ( ): ( , ) P x x P x i y j x i xy  1 j Drop the x, y for simplicity of notation • To compute the marginal, one needs the joint distribution p(x,y). • Often, one does not know it and one can only estimate it. • If x is a multidimensional variable  the marginal is a joint distribution! 4

APPLIED MACHINE LEARNING Joint Distribution and Curse of Dimensionality The joint distribution is far richer than the marginals. The marginals of N variables taking K values corresponds to N(K-1) probabilities. The joint distribution corresponds to ~N K probabilities. Pros of computing the joint distribution: Provides statistical dependencies across all variables and the marginal distributions Cons: Computational costs grow exponentially with number of dimensions (statistical power: 10 samples to estimate each parameter of a model)  Compute solely the conditional if you care only about dependencies across variables (this will be relevant for lecture on non-linear regression methods) 5

APPLIED MACHINE LEARNING Probability Distributions, Density Functions p(x) a continuous function is the probability density function or probability distribution function (PDF) (sometimes also called probability distribution or simply density) of variable x .    ( ) 0, p x x    ( ) 1 p x dx  6

APPLIED MACHINE LEARNING Probability Distributions, Density Functions The pdf is not bounded by 1. It can grow unbounded, depending on the value taken by x. p(x) x 7

APPLIED MACHINE LEARNING PDF equivalency with Discrete Probability The cumulative distribution function (or simply distribution function) of X is:       * * D x P x x x   *  x   * ( ) , D x p x dx x x  p ( x ) d x ~ probability of x to fall within an infinitesimal interval [ x , x + d x ] 8

APPLIED MACHINE LEARNING PDF equivalency with Discrete Probability Uniform distribution on x p(x) Probability that x takes a value x in the subinterval [a,b] is given by:  b     ( ) : ( ) ( ) P x b D x b p x dx x          * ( ) ( ) ( ) D x P a x b D x b D x a x x x  b     ( ) ( ) 1 P a x b p x dx a * 9 x

APPLIED MACHINE LEARNING Expectation The expectation of the random variable x with probability P(x) (in the discrete case) and pdf p(x) (in the continuous case), also called the expected value or mean, is the mean of the observed value of x weighted by p(x). If X is the set of observations of x, then:       When x takes discrete values: ( ) E x xP x  x X         For continuous distributions: ( ) E x x p x dx X 10

APPLIED MACHINE LEARNING Variance  , the variance of a distribution measures the amount of spread of the 2 distribution around its mean:         2 2           2 2 ( ) Var x E x E x E x   is the standard deviation of x. 11

APPLIED MACHINE LEARNING Parametric PDF The uni-dimensional Gaussian or Normal distribution is a distribution with pdf given by:     2   x   1      2  2   , μ:mean, σ :variance 2 p x e   2 The Gaussian function is entirely determined by its mean and variance. For this reason, it is referred to as a parametric distribution. 12 Illustrations from Wikipedia

APPLIED MACHINE LEARNING Mean and Variance in PDF ~68% of the data are comprised between +/ 1 sigma ~96% of the data are comprised between +/ 2 sigma-s ~99% of the data are comprised between +/ 3 sigma-s This is no longer true for arbitrary pdf-s! 13 Illustrations from Wikipedia

APPLIED MACHINE LEARNING Mean and Variance in PDF 0.7 0.6 1sigma=0.68 0.5 f=1/3(f1+f2+f3) 0.4 0.3 0.2 0.1 Expectation: 0 -4 -3 -2 -1 0 1 2 3 4 x Resulting distribution when superposing the 3 Gaussians distributions 3 Gaussian distributions. For other pdf than the Gaussian distribution, the variance represents a notion of dispersion around the expected value. Matlab Demo I 14

APPLIED MACHINE LEARNING Multi-dimensional Gaussian Function The uni-dimensional Gaussian or Normal distribution is a distribution with pdf given by:     2   x    1      2    2   , μ:mean, σ:variance ; , p x e   2 The multi-dimensional Gaussian or Normal distribution has a pdf given by:   1        T    1 1  x x         2 ; , p x e 1 N      2 2 2 if x is N-dimensional, then  μ is a dimensional mean vector N   is a covariance matrix N N 15

APPLIED MACHINE LEARNING 2-dimensional Gaussian Pdf   , p x x 1 2 x 2 x x 2 1 x 1   1        T    1 1  x x         2 ; , p x e 1   N       Isolines: p x cst 2 2 2 if x is N-dimensional, then  μ is a dimensional mean vector N   is a covariance matrix N N 16

APPLIED MACHINE LEARNING Modeling Data with a Gaussian Function    1... i M  i Construct covariance matrix from (centered) set of datapoints : X x 1   T XX M   1        T    1 1  x x         2 ; , p x e 1 N      2 2 2 if x is N-dimensional, then  μ is a dimensional mean vector N   is a covariance matrix N N 17

APPLIED MACHINE LEARNING Modeling Data with a Gaussian Function    1... i M  i Construct covariance matrix from (centered) set of datapoints : X x 1   T XX M  is square and symmetric. It can be decomposed using the eigenvalue decomposition.    T , V V    0    1 : matrix of eigenvectors, : diagonal matrix composed of eigenvalues V   .......     0   N For the 1-std ellipse, the axes' lengths are 1 st eigenvector equal to:    0     1 T x and , with   . V V 2  1 2  0  2nd eigenvector 2 Each isoline corresponds to a scaling of the 1std ellipse. x 18 1

APPLIED MACHINE LEARNING Fitting a single Gauss function and PCA PCA Identifies a suitable representation of a multivariate data set by decorrelating the dataset.       1 1 2   When projected onto e and e , the set of T X   2 ~ ; , p e X N   2 2 2   datapoints appears to follow two uncorrelated Normal distributions. 2 e 1 st eigenvector       1   T X   1 ~ ; , p e X N   2 1 1   x 2 2nd eigenvector 1 e x 19 1

APPLIED MACHINE LEARNING Marginal, Conditional in Pdf Consider two random variables x 1 and x 2 with joint distribution p(x 1 , x 2 ), then the marginal probability of x 1 given x 1 is:     ( , ) p x p x x dx 1 1 2 2 The conditional probability is given by:   ( | ) ( , ) p x x p x p x x        1 2 2 1 2 | | p x x p x x     2 1 2 1 p x p x 1 1 20

APPLIED MACHINE LEARNING Marginal, Conditional Pdf of Gauss Functions The conditional and marginal pdf of a multi-dimensional Gauss function are all Gauss functions! joint density of , x x 1 2   , p x x 1 2 marginal density of x    , 2 1 2 2 conditional density of x 2 x  given 0. 1 Matlab Exercise II 1  1 x  0 marginal density of x 1 21 Illustrations from Wikipedia 1

APPLIED MACHINE LEARNING Probability Density Functions Gaussian - PowerPoint PPT Presentation

APPLIED MACHINE LEARNING APPLIED MACHINE LEARNING Probability Density Functions Gaussian Mixture Models 1 APPLIED MACHINE LEARNING Discrete Probabilities Consider two variables x and y taking discrete values over the intervals [1.....N ] and

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Contact

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Slides and

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Applied Machine Learning Applied Machine Learning Convolutional Neural Networks Siamak

Applied Machine Learning Applied Machine Learning Multilayer Perceptron Siamak Ravanbakhsh

Applied Machine Learning Applied Machine Learning Convolutional Neural Networks Siamak

Applied Machine Learning Applied Machine Learning Perceptron and Support Vector Machines Siamak

Applied Machine Learning Applied Machine Learning Decision Trees Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh

Applied Machine Learning Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh

Applied Machine Learning Applied Machine Learning Bootstrap, Bagging and Boosting Siamak

CSCE 970 Lecture 4: Introduction to Bayesian Networks E.g. each vector represents a medical

Optical Propagation, Detection, and Communication Jeffrey H. Shapiro Massachusetts Institute of

7. Two Random Variables In many experiments, the observations are expressible not as a single

UQ, STAT2201, 2017, Lecture 5 Unit 4 Joint Distributions and Unit 5 Descriptive

I 02 - Likelihood STAT 587 (Engineering) Iowa State University September 10, 2020 Modeling

Machine Learning Lecture 01-1: Basics of Probability Theory Nevin L. Zhang lzhang@cse.ust.hk

Language models Chapter 3 in Martin/Jurafsky Probabilistic Language Models Goal: assign a

Introduction to Machine Learning ML-Basics: Data Learning goals 10 Understand structure of