Some basics in probability and statistics . Course of Machine - PowerPoint PPT Presentation

Some basics in probability and statistics . Course of Machine Learning Master Degree in Computer Science University of Rome ``Tor Vergata'' Giorgio Gambosi a.a. 2018-2019 1

Discrete random variables Properties • 2 A discrete random variable X can take values from some finite or countably infinite set X . A probability mass function (pmf) associates to each event X = x a probability p ( X = x ) . • 0 ≤ p ( x ) ≤ 1 for all x ∈ X ∑ x ∈X p ( x )=1 Note: we shall denote as x the event X = x

Discrete random variables Joint and conditional probabilities occurred Union of events in particular, The same definitions hold for probability distributions. 3 Given two events x, y , it is possible to define: • the probability p ( x, y ) = p ( x ∧ y ) of their joint occurrence • the conditional probability p ( x | y ) of x under the hypothesis that y has Given two events x, y , the probability of x or y is defined as p ( x ∨ y ) = p ( x ) + p ( y ) − p ( x, y ) p ( x ∨ y ) = p ( x ) + p ( y )

Discrete random variables Product rule The product rule relates joint and conditional probabilities In general, 4 p ( x, y ) = p ( x | y ) p ( y ) = p ( y | x ) p ( x ) where p ( x ) is the marginal probability. p ( x 1 , . . . , x n ) = p ( x 2 , . . . , x n | x 1 ) p ( x 1 ) = p ( x 3 , . . . , x n | x 1 , x 2 ) p ( x 2 | x 1 ) p ( x 1 ) = · · · = p ( x n | x 1 , . . . , x n − 1 ) p ( x n − 1 | x 1 . . . x n − 2 ) · · · p ( x 2 | x 1 ) p ( x 1 )

Discrete random variables Sum rule and marginalization Applying the sum rule to derive a marginal probability from a joint 5 The sum rule relates the joint probability of two events x, y and the probability of one such events p ( y ) (or p ( y ) ) ∑ ∑ p ( x ) = p ( x, y ) = p ( x | y ) p ( y ) y ∈Y y ∈Y probability is usually called marginalization

Discrete random variables and Terminology Bayes rule it results 6 Since p ( x, y ) = p ( x | y ) p ( y ) p ( x, y ) = p ( y | x ) p ( x ) ∑ ∑ p ( y | x ) p ( x ) p ( y ) = p ( x, y ) = x ∈X x ∈§ p ( x | y ) = p ( y | x ) p ( x ) p ( y | x ) p ( x ) = p ( y ) ∑ x ∈X p ( y | x ) p ( x ) • p ( x ) : Prior probability of x (before knowing that y occurred) • p ( x | y ) : Posterior of x (if y has occurred) • p ( y | x ) : Likelihood of y given x • p ( y ) : Evidence of y

Independence Definition probability is equal to the product of their marginals or, equivalently, independent, knowing the value of one does not add any knowledge about the other one. 7 Two random variables X, Y are independent ( X ⊥ ⊥ Y ) if their joint p ( x, y ) = p ( x ) p ( y ) p ( x | y ) = p ( x ) p ( y | x ) = p ( y ) The condition p ( x | y ) = p ( x ) , in particular, states that, if two variables are

Independence Conditional independence Conditional independence does not imply (absolute) independence, and vice versa. 8 Two random variables X, Y are conditionally independent w.r.t. a third r.v. Z ( X ⊥ ⊥ Y | Z ) if p ( x, y | z ) = p ( x | z ) p ( y | z )

Continuous random variables Probability density function and consequence, . As a 9 A continuous random variable X can take values from a continuous infinite set X . Its probability is defined as cumulative distribution function (cdf) F ( x ) = p ( X ≤ x ) . The probability that X is in an interval ( a, b ] is then p ( a < X ≤ b ) = F ( b ) − F ( a ) . The probability density function (pdf) is defined as f ( x ) = dF ( x ) dx ∫ b p ( a < X ≤ b ) = f ( x ) dx a p ( x < X ≤ x + dx ) ≈ f ( x ) dx for a sufficiently small dx .

Sum rule and continuous random variables In the case of continuous random variables, their probability density functions relate as follows. 10 ∫ ∫ p ( x | y ) p ( y ) dy f ( x ) = f ( x, y ) dy = Y y ∈Y

Expectation Definition Mean value 11 Let x be a discrete random variable with distribution p ( x ) , and let g : I R �→ I R be any function: the expectation of g ( x ) w.r.t. p ( x ) is ∑ E p [ g ( x )] = g ( x ) p ( x ) x ∈ V x If x is a continuous r.v., with probability density f ( x ) , then ∫ ∞ E f [ g ( x )] = g ( x ) f ( x ) dx −∞ Particular case: g ( x ) = x ∫ ∞ ∑ E p [ x ] = xp ( x ) E f [ x ] = xf ( x ) dx −∞ x ∈ V x

12 Elementary properties of expectation • E [ a ] = a for each a ∈ I R • E [ af ( x )] = a E [ f ( x )] for each a ∈ I R • E [ f ( x ) + g ( x )] = E [ f ( x )] + E [ g ( x )]

Variance Definition Some elementary properties: 13 We may easily derive: Var [ X ] = E [( x − E [ x ]) 2 ] E [ x 2 − 2 E [ x ] x + E [ x ] 2 ] E [( x − E [ x ]) 2 ] = E [ x 2 ] − 2 E [ x ] E [ x ] + E [ x ] 2 = E [ x 2 ] − E [ x ] 2 = • Var [ a ] = 0 for each a ∈ I R • Var [ af ( x )] = a 2 Var [ f ( x )] for each a ∈ I R

Probability distributions Probability distribution • • 14 Given a discrete random variable X ∈ V X , the corresponding probability distribution is a function p ( x ) = P ( X = x ) such that • 0 ≤ p ( x ) ≤ 1 ∑ p ( x ) = 1 x ∈ V X ∑ p ( x ) = P ( x ∈ A ) , with A ⊆ V X x ∈ A p ( x ) x

Some definitions • • Cumulative distribution 15 Given a continuous random variable X ∈ I R , the corresponding cumulative probability distribution is a function F ( x ) = P ( X ≤ x ) such that: • 0 ≤ F ( x ) ≤ 1 x →−∞ F ( x ) = 0 lim x →∞ F ( x ) = 1 lim • x ≤ y = ⇒ F ( x ) ≤ F ( y ) F ( x ) x

Some definitions The following properties hold: • • Probability density 16 Given a continuous random variable X ∈ I R with derivable cumulative distribution F ( x ) , the probability density is defined as f ( x ) = dF ( x ) dx By definition of derivative, for a sufficiently small ∆ x , Pr ( x ≤ X ≤ x + ∆ x ) ≈ f ( x )∆ x • f ( x ) ≥ 0 ∫ ∞ f ( x ) −∞ f ( x ) dx = 1 ∫ x ∈ A f ( x ) dx = P ( X ∈ A ) x

Bernoulli distribution Definition Mean and variance or, equivalently, 17 Let x ∈ { 0 , 1 } , then x ∼ Bernoulli ( p ) , with 0 ≤ p ≤ 1 , if  p se x = 1  p ( x ) = 1 − p se x = 0  p ( x ) = p x (1 − p ) 1 − x Probability that, given a coin with head (H) probability p (and tail probability (T) 1 − p ), a coin toss result into x ∈ { H, T } . E [ x ] = p Var [ x ] = p (1 − p )

Extension to multiple outcomes In this case, a generalization of the Bernoulli distribution is considered, usualy named categorical distribution. 18 Assume k possible outcomes (for example a die toss). k x j ∏ p ( x ) = p j j =1 where ( p 1 , . . . , p k ) are the probabilites of the different outcomes ( ∑ k j =1 p j = 1 ) and x j = 1 iff the k -th outcome occurs.

Binomial distribution Definition Mean and variance 19 Let x ∈ I N , then x ∼ Binomial ( n, p ) , with 0 ≤ p ≤ 1 , if ( ) n n ! p x (1 − p ) n − x = x !( n − x )! p x (1 − p ) n − x p ( x ) = x Probability that, given a coin with head (H) probability p , a sequence of n independent coin tosses result into x heads. p ( x ) E [ x ] = np Var [ x ] = np (1 − p ) x

Poisson distribution Definition next time unit. Mean and variance 20 Let x i ∈ I N , then x ∼ Poisson ( λ ) , with λ > 0 , if p ( x ) = e − λ λ x x ! Probability that an event with average frequency λ occurs x times in the p ( x ) E [ x ] = λ Var [ x ] = λ x

Normal (gaussian) distribution Definition Mean and variance 21 R , then x ∼ Normal ( µ, σ 2 ) , with µ, σ ∈ I Let x ∈ I R , σ ≥ 0 , if ( x − µ )2 1 f ( x ) = √ e 2 σ 2 2 πσ f ( x ) E [ x ] = µ Var [ x ] = σ 2 x

Beta distribution where Mean and variance Definition 22 Let x ∈ [0 , 1] , then x ∼ Beta ( α, β ) , with α, β > 0 , if f ( x ) = Γ( α + β ) Γ( α )Γ( β ) x α − 1 (1 − x ) β − 1 ∫ ∞ u x − 1 e u du Γ( x ) = 0 is a generalization of the factorial to the real field I R : in particolar, Γ( n ) = ( n − 1)! if n ∈ I N β E [ x ] = α + β αβ Var [ x ] = ( α + β ) 2 ( α + β + 1)

Beta distribution 23 α =1, β =1 α =0.7, β =0.7 f ( x ) f ( x ) x x α =2, β =2 α =2, β =4 f ( x ) f ( x ) x x α =6, β =4 α =10, β =10 f ( x ) f ( x ) x x

Multivariate distributions The following properties hold: 24 Definition for k = 2 discrete variables Given two discrete r.v. X, Y , their joint distribution is p ( x, y ) = P ( X = x, Y = y ) 1. 0 ≤ p ( x, y ) ≤ 1 2. ∑ ∑ y ∈ V Y p ( x, y ) = 1 x ∈ V X

Multivariate distributions 2. The following property derives density is 3. 25 as The following properties hold: Definition for k = 2 variables Given two continuous r.v. X, Y , their cumulative joint distribution is defined F ( x, y ) = P ( X ≤ x, Y ≤ y ) 1. 0 ≤ F ( x, y ) ≤ 1 x,y →∞ F ( x, y ) = 1 lim x,y →−∞ F ( x, y ) = 0 lim If F ( x, y ) is derivable everywhere w.r.t. both x and y , joint probability f ( x, y ) = ∂ 2 F ( x, y ) ∂x∂y ∫ ∫ f ( x, y ) dxdy = P (( X, Y ) ∈ A ) ( x,y ) ∈ A

Covariance Definition Moreover, the following properties hold: 26 As for the variance, we may derive Cov [ X, Y ] = E [( X − E [ X ])( Y − E [ Y ])] E [( X − E [ X ])( Y − E [ Y ])] Cov [ X, Y ] = = E [ XY − X E [ Y ] − Y E [ X ] + E [ X ] E [ Y ]] E [ XY ] − E [ X ] E [ Y ] − E [ Y ] E [ X ] + E [ E [ X ] E [ Y ]] = = E [ XY ] − E [ X ] E [ Y ] 1. Var [ X + Y ] = Var [ X ] + Var [ Y ] + 2 Cov [ X, Y ] 2. If X ⊥ ⊥ Y then Cov [ X, Y ] = 0

Some basics in probability and statistics . Course of Machine - PowerPoint PPT Presentation

Some basics in probability and statistics . Course of Machine Learning Master Degree in Computer Science University of Rome ``Tor Vergata'' Giorgio Gambosi a.a. 2018-2019 1 Discrete random variables Properties 2 A discrete random

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Probability Basics Probabilistic Inference Martin Emms October 1, 2020 Probability Basics

Basics of Probability Basics of Probability Janyl Jumadinova February 2426, 2020 Janyl

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

CS70: Jean Walrand: Lecture 21. Events, Conditional Probability 1. Probability Basics Review 2.

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Counting and Probability Whats to come? Counting and Probability Whats to come?

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Chapter II: Basics from probability theory and statistics Information Retrieval & Data

Probability statistics So, understand some basic probability Chapters 4 & 5 Also,

Outline 1. Bayes Law L7: Probability Basics 2. Probability distributions CS 344R/393R:

Probability Review CMSC 473/673 UMBC Some slides adapted from 3SLP, Jason Eisner Probability

Which probability Which probability Which probability Which probability theory for cosmology?

Adversarial event generator tuning with Bayesian Optimization Maxim Borisyak, Andrey Ustyuzhanin

A deterministic algorithm for stochastic multistage problems or The problem-child algorithm

Developing and Shipping LLVM and Clang with CMake The lesser of two evils Chris Bieneman IRC:

Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14

T-61.3050 Machine Learning: Basic Principles Dimensionality Reduction Kai Puolam aki

MLES & Multivariate Normal Theory STA721 Linear Models Duke University Merlise Clyde

Separability f : x = ( x 1 , , x n ) n f ( x ) Given , let us de fi ne the

Unit 2: Probability and distributions 1. Probability and conditional probability GOVT 3990 -

Some basics in probability and statistics . Course of Machine - PowerPoint PPT Presentation

Some basics in probability and statistics . Course of Machine Learning Master Degree in Computer Science University of Rome ``Tor Vergata'' Giorgio Gambosi a.a. 2018-2019 1 Discrete random variables Properties 2 A discrete random

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Probability Basics Probabilistic Inference Martin Emms October 1, 2020 Probability Basics

Basics of Probability Basics of Probability Janyl Jumadinova February 2426, 2020 Janyl

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

CS70: Jean Walrand: Lecture 21. Events, Conditional Probability 1. Probability Basics Review 2.

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Counting and Probability Whats to come? Counting and Probability Whats to come?

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Chapter II: Basics from probability theory and statistics Information Retrieval &amp; Data

Probability statistics So, understand some basic probability Chapters 4 &amp; 5 Also,

Outline 1. Bayes Law L7: Probability Basics 2. Probability distributions CS 344R/393R:

Probability Review CMSC 473/673 UMBC Some slides adapted from 3SLP, Jason Eisner Probability

Which probability Which probability Which probability Which probability theory for cosmology?

Adversarial event generator tuning with Bayesian Optimization Maxim Borisyak, Andrey Ustyuzhanin

A deterministic algorithm for stochastic multistage problems or The problem-child algorithm

Developing and Shipping LLVM and Clang with CMake The lesser of two evils Chris Bieneman IRC:

Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14

T-61.3050 Machine Learning: Basic Principles Dimensionality Reduction Kai Puolam aki

MLES &amp; Multivariate Normal Theory STA721 Linear Models Duke University Merlise Clyde

Separability f : x = ( x 1 , , x n ) n f ( x ) Given , let us de fi ne the

Unit 2: Probability and distributions 1. Probability and conditional probability GOVT 3990 -

Chapter II: Basics from probability theory and statistics Information Retrieval & Data

Probability statistics So, understand some basic probability Chapters 4 & 5 Also,

MLES & Multivariate Normal Theory STA721 Linear Models Duke University Merlise Clyde