some basics in probability and statistics
play

Some basics in probability and statistics . Course of Machine - PowerPoint PPT Presentation

Some basics in probability and statistics . Course of Machine Learning Master Degree in Computer Science University of Rome ``Tor Vergata'' Giorgio Gambosi a.a. 2018-2019 1 Discrete random variables Properties 2 A discrete random


  1. Some basics in probability and statistics . Course of Machine Learning Master Degree in Computer Science University of Rome ``Tor Vergata'' Giorgio Gambosi a.a. 2018-2019 1

  2. Discrete random variables Properties • 2 A discrete random variable X can take values from some finite or countably infinite set X . A probability mass function (pmf) associates to each event X = x a probability p ( X = x ) . • 0 ≤ p ( x ) ≤ 1 for all x ∈ X ∑ x ∈X p ( x )=1 Note: we shall denote as x the event X = x

  3. Discrete random variables Joint and conditional probabilities occurred Union of events in particular, The same definitions hold for probability distributions. 3 Given two events x, y , it is possible to define: • the probability p ( x, y ) = p ( x ∧ y ) of their joint occurrence • the conditional probability p ( x | y ) of x under the hypothesis that y has Given two events x, y , the probability of x or y is defined as p ( x ∨ y ) = p ( x ) + p ( y ) − p ( x, y ) p ( x ∨ y ) = p ( x ) + p ( y )

  4. Discrete random variables Product rule The product rule relates joint and conditional probabilities In general, 4 p ( x, y ) = p ( x | y ) p ( y ) = p ( y | x ) p ( x ) where p ( x ) is the marginal probability. p ( x 1 , . . . , x n ) = p ( x 2 , . . . , x n | x 1 ) p ( x 1 ) = p ( x 3 , . . . , x n | x 1 , x 2 ) p ( x 2 | x 1 ) p ( x 1 ) = · · · = p ( x n | x 1 , . . . , x n − 1 ) p ( x n − 1 | x 1 . . . x n − 2 ) · · · p ( x 2 | x 1 ) p ( x 1 )

  5. Discrete random variables Sum rule and marginalization Applying the sum rule to derive a marginal probability from a joint 5 The sum rule relates the joint probability of two events x, y and the probability of one such events p ( y ) (or p ( y ) ) ∑ ∑ p ( x ) = p ( x, y ) = p ( x | y ) p ( y ) y ∈Y y ∈Y probability is usually called marginalization

  6. Discrete random variables and Terminology Bayes rule it results 6 Since p ( x, y ) = p ( x | y ) p ( y ) p ( x, y ) = p ( y | x ) p ( x ) ∑ ∑ p ( y | x ) p ( x ) p ( y ) = p ( x, y ) = x ∈X x ∈§ p ( x | y ) = p ( y | x ) p ( x ) p ( y | x ) p ( x ) = p ( y ) ∑ x ∈X p ( y | x ) p ( x ) • p ( x ) : Prior probability of x (before knowing that y occurred) • p ( x | y ) : Posterior of x (if y has occurred) • p ( y | x ) : Likelihood of y given x • p ( y ) : Evidence of y

  7. Independence Definition probability is equal to the product of their marginals or, equivalently, independent, knowing the value of one does not add any knowledge about the other one. 7 Two random variables X, Y are independent ( X ⊥ ⊥ Y ) if their joint p ( x, y ) = p ( x ) p ( y ) p ( x | y ) = p ( x ) p ( y | x ) = p ( y ) The condition p ( x | y ) = p ( x ) , in particular, states that, if two variables are

  8. Independence Conditional independence Conditional independence does not imply (absolute) independence, and vice versa. 8 Two random variables X, Y are conditionally independent w.r.t. a third r.v. Z ( X ⊥ ⊥ Y | Z ) if p ( x, y | z ) = p ( x | z ) p ( y | z )

  9. Continuous random variables Probability density function and consequence, . As a 9 A continuous random variable X can take values from a continuous infinite set X . Its probability is defined as cumulative distribution function (cdf) F ( x ) = p ( X ≤ x ) . The probability that X is in an interval ( a, b ] is then p ( a < X ≤ b ) = F ( b ) − F ( a ) . The probability density function (pdf) is defined as f ( x ) = dF ( x ) dx ∫ b p ( a < X ≤ b ) = f ( x ) dx a p ( x < X ≤ x + dx ) ≈ f ( x ) dx for a sufficiently small dx .

  10. Sum rule and continuous random variables In the case of continuous random variables, their probability density functions relate as follows. 10 ∫ ∫ p ( x | y ) p ( y ) dy f ( x ) = f ( x, y ) dy = Y y ∈Y

  11. Expectation Definition Mean value 11 Let x be a discrete random variable with distribution p ( x ) , and let g : I R �→ I R be any function: the expectation of g ( x ) w.r.t. p ( x ) is ∑ E p [ g ( x )] = g ( x ) p ( x ) x ∈ V x If x is a continuous r.v., with probability density f ( x ) , then ∫ ∞ E f [ g ( x )] = g ( x ) f ( x ) dx −∞ Particular case: g ( x ) = x ∫ ∞ ∑ E p [ x ] = xp ( x ) E f [ x ] = xf ( x ) dx −∞ x ∈ V x

  12. 12 Elementary properties of expectation • E [ a ] = a for each a ∈ I R • E [ af ( x )] = a E [ f ( x )] for each a ∈ I R • E [ f ( x ) + g ( x )] = E [ f ( x )] + E [ g ( x )]

  13. Variance Definition Some elementary properties: 13 We may easily derive: Var [ X ] = E [( x − E [ x ]) 2 ] E [ x 2 − 2 E [ x ] x + E [ x ] 2 ] E [( x − E [ x ]) 2 ] = E [ x 2 ] − 2 E [ x ] E [ x ] + E [ x ] 2 = E [ x 2 ] − E [ x ] 2 = • Var [ a ] = 0 for each a ∈ I R • Var [ af ( x )] = a 2 Var [ f ( x )] for each a ∈ I R

  14. Probability distributions Probability distribution • • 14 Given a discrete random variable X ∈ V X , the corresponding probability distribution is a function p ( x ) = P ( X = x ) such that • 0 ≤ p ( x ) ≤ 1 ∑ p ( x ) = 1 x ∈ V X ∑ p ( x ) = P ( x ∈ A ) , with A ⊆ V X x ∈ A p ( x ) x

  15. Some definitions • • Cumulative distribution 15 Given a continuous random variable X ∈ I R , the corresponding cumulative probability distribution is a function F ( x ) = P ( X ≤ x ) such that: • 0 ≤ F ( x ) ≤ 1 x →−∞ F ( x ) = 0 lim x →∞ F ( x ) = 1 lim • x ≤ y = ⇒ F ( x ) ≤ F ( y ) F ( x ) x

  16. Some definitions The following properties hold: • • Probability density 16 Given a continuous random variable X ∈ I R with derivable cumulative distribution F ( x ) , the probability density is defined as f ( x ) = dF ( x ) dx By definition of derivative, for a sufficiently small ∆ x , Pr ( x ≤ X ≤ x + ∆ x ) ≈ f ( x )∆ x • f ( x ) ≥ 0 ∫ ∞ f ( x ) −∞ f ( x ) dx = 1 ∫ x ∈ A f ( x ) dx = P ( X ∈ A ) x

  17. Bernoulli distribution Definition Mean and variance or, equivalently, 17 Let x ∈ { 0 , 1 } , then x ∼ Bernoulli ( p ) , with 0 ≤ p ≤ 1 , if  p se x = 1  p ( x ) = 1 − p se x = 0  p ( x ) = p x (1 − p ) 1 − x Probability that, given a coin with head (H) probability p (and tail probability (T) 1 − p ), a coin toss result into x ∈ { H, T } . E [ x ] = p Var [ x ] = p (1 − p )

  18. Extension to multiple outcomes In this case, a generalization of the Bernoulli distribution is considered, usualy named categorical distribution. 18 Assume k possible outcomes (for example a die toss). k x j ∏ p ( x ) = p j j =1 where ( p 1 , . . . , p k ) are the probabilites of the different outcomes ( ∑ k j =1 p j = 1 ) and x j = 1 iff the k -th outcome occurs.

  19. Binomial distribution Definition Mean and variance 19 Let x ∈ I N , then x ∼ Binomial ( n, p ) , with 0 ≤ p ≤ 1 , if ( ) n n ! p x (1 − p ) n − x = x !( n − x )! p x (1 − p ) n − x p ( x ) = x Probability that, given a coin with head (H) probability p , a sequence of n independent coin tosses result into x heads. p ( x ) E [ x ] = np Var [ x ] = np (1 − p ) x

  20. Poisson distribution Definition next time unit. Mean and variance 20 Let x i ∈ I N , then x ∼ Poisson ( λ ) , with λ > 0 , if p ( x ) = e − λ λ x x ! Probability that an event with average frequency λ occurs x times in the p ( x ) E [ x ] = λ Var [ x ] = λ x

  21. Normal (gaussian) distribution Definition Mean and variance 21 R , then x ∼ Normal ( µ, σ 2 ) , with µ, σ ∈ I Let x ∈ I R , σ ≥ 0 , if ( x − µ )2 1 f ( x ) = √ e 2 σ 2 2 πσ f ( x ) E [ x ] = µ Var [ x ] = σ 2 x

  22. Beta distribution where Mean and variance Definition 22 Let x ∈ [0 , 1] , then x ∼ Beta ( α, β ) , with α, β > 0 , if f ( x ) = Γ( α + β ) Γ( α )Γ( β ) x α − 1 (1 − x ) β − 1 ∫ ∞ u x − 1 e u du Γ( x ) = 0 is a generalization of the factorial to the real field I R : in particolar, Γ( n ) = ( n − 1)! if n ∈ I N β E [ x ] = α + β αβ Var [ x ] = ( α + β ) 2 ( α + β + 1)

  23. Beta distribution 23 α =1, β =1 α =0.7, β =0.7 f ( x ) f ( x ) x x α =2, β =2 α =2, β =4 f ( x ) f ( x ) x x α =6, β =4 α =10, β =10 f ( x ) f ( x ) x x

  24. Multivariate distributions The following properties hold: 24 Definition for k = 2 discrete variables Given two discrete r.v. X, Y , their joint distribution is p ( x, y ) = P ( X = x, Y = y ) 1. 0 ≤ p ( x, y ) ≤ 1 2. ∑ ∑ y ∈ V Y p ( x, y ) = 1 x ∈ V X

  25. Multivariate distributions 2. The following property derives density is 3. 25 as The following properties hold: Definition for k = 2 variables Given two continuous r.v. X, Y , their cumulative joint distribution is defined F ( x, y ) = P ( X ≤ x, Y ≤ y ) 1. 0 ≤ F ( x, y ) ≤ 1 x,y →∞ F ( x, y ) = 1 lim x,y →−∞ F ( x, y ) = 0 lim If F ( x, y ) is derivable everywhere w.r.t. both x and y , joint probability f ( x, y ) = ∂ 2 F ( x, y ) ∂x∂y ∫ ∫ f ( x, y ) dxdy = P (( X, Y ) ∈ A ) ( x,y ) ∈ A

  26. Covariance Definition Moreover, the following properties hold: 26 As for the variance, we may derive Cov [ X, Y ] = E [( X − E [ X ])( Y − E [ Y ])] E [( X − E [ X ])( Y − E [ Y ])] Cov [ X, Y ] = = E [ XY − X E [ Y ] − Y E [ X ] + E [ X ] E [ Y ]] E [ XY ] − E [ X ] E [ Y ] − E [ Y ] E [ X ] + E [ E [ X ] E [ Y ]] = = E [ XY ] − E [ X ] E [ Y ] 1. Var [ X + Y ] = Var [ X ] + Var [ Y ] + 2 Cov [ X, Y ] 2. If X ⊥ ⊥ Y then Cov [ X, Y ] = 0

Recommend


More recommend