Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability spaces, events, and random variables 2. Distributions 2.1. Discrete distributions 2.2. Continuous distributions 3. Moments, independence, and Bayes’ rule 3.1. Expectation, variance, and higher moments 3.2. Independence 3.3. Bayes’ rule 4. Bounds and convergence 5. Statistical inference Wasserman, Ch. 1–5 IR&DM, WS'13/14 24 October 2013 II.2- 1
What is a probability • “If I throw a dice, I will probably get 4 or less” • “I’ll probably go running after this lecture” • The term “probability” here means different things – The outcome of a repeatable experiment – My personal belief IR&DM, WS'13/14 24 October 2013 II.2- 2
Views on probability • In classical definition, probability is equally shared among all outcomes, provided the outcomes are equally likely – “Equally likely” is decided based on physical symmetries or the like • In frequentism , a probability is the frequency of which something happens over repeated experiments – Requires infinite number of repetitions • In subjectivism ( Bayesianism ), probability refers to my subjective “degree of belief” – But everybody’s belief is different IR&DM, WS'13/14 24 October 2013 II.2- 3
Axiomatic approach: sample spaces and events • A sample space Ω is a set of all possible outcomes of an experiment – Element e ∈ Ω is a sample outcome or realization • Subsets E ⊆ Ω are events • Examples: – If we toss a coin twice, Ω = {HH, HT, TH, TT} • Event “Second toss is tails” is A = {HT, TT} – If we toss a coin until we get tails, Ω = {T, HT, HHT, HHHT, HHHHT, HHHHHT, …} – If we measure a temperature in Kelvins, Ω = { x ∈ ℝ , x ≥ 0} IR&DM, WS'13/14 24 October 2013 II.2- 4
Axiomatic approach: probability measures • Collection ⊆ 2 Ω is a σ - algebra of Ω if – Ω ∈ – If A ∈ , then ( Ω \ A ) ∈ – If A 1 , A 2 , A 3 , … ∈ , then ( ∪ i A i ) ∈ • Function Pr: → [0, 1] is a probability measure if – Axiom 1: Pr[ A ] ≥ 0 for every A ∈ – Axiom 2: Pr[ Ω ] = 1 – Axiom 3: If A 1 , A 2 , … are disjoint, then Pr[ ∪ i A i ] = ∑ i Pr[ A i ] (countably many A i s) IR&DM, WS'13/14 24 October 2013 II.2- 5
Intermission: some combinatorics • The power set of a set A , 2 A (or 𝒬 ( A )) is a collection of all subsets of A – If A = {1, 2, 3}, then 2 A = { ∅ , {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}} – The size of the power set is 2 | A | • If A is finite, this is a natural number • If A = ℕ , this is the same cardinality as the real numbers • If A = ℝ , this is the next cardinal number • The number of size- k subsets of A is ✓ | A | ◆ | A | ! = k ! ( | A | − k ) ! k IR&DM, WS'13/14 24 October 2013 II.2- 6
Axiomatic approach: probability spaces and further properties • A probability space is a triple ( Ω , , Pr) – contains all the events we can assign a probability • If Ω is finite or countably infinite, we can have = 2 Ω • If Ω is uncountable, it contains sets that cannot have probability (unmeasurable sets) • From the axioms we can derive that – Pr[ ∅ ] = 0 – If A ⊆ B , then Pr[ A ] ≤ Pr[ B ] – Pr[ Ω \ A ] = 1 – Pr[ A ] – Pr[ A ∪ B ] = Pr[ A ] + Pr[ B ] – Pr[ A ∩ B ] IR&DM, WS'13/14 24 October 2013 II.2- 7
Axiomatic approach: random variables • A random variable ( r.v. ) is a function X : → ℝ such that { e ∈ Ω : X ( e ) ≤ r } ∈ for all r ∈ ℝ – This is needed to define probabilities like Pr[ a ≤ X ≤ b ] – Pr[ X = x ] is a shorthand for Pr[{ e ∈ Ω : X ( e ) = x }] • An r.v. is discrete if it takes at most countably infinite different discrete values – None of the complexities applies • An r.v. is continuous if it varies continuously in one or more intervals – These are the ones that cause problems IR&DM, WS'13/14 24 October 2013 II.2- 8
Example r.v.’s • Indicator variable 𝟚 E or χ E for event E ∈ – 𝟚 E ( x ) = 1 if x ∈ E and 𝟚 E ( x ) = 0 otherwise – Pr[ E ] = Pr[ 𝟚 E = 1] • Let r.v. X be the number of heads in 10 coin flips – If e = HTTTTTHHTT, then X ( e ) = 3 – Discrete r.v. • Let r.v. Y be the room temperature of my kitchen (in Celsius) – if e = “00:22 on 22 Oct”, then X ( e ) = 22,7 – Continuous r.v. IR&DM, WS'13/14 24 October 2013 II.2- 9
Some diagrams (1) • The Venn diagram is a way to visualize the combinatorial relationships of three sets A ∩ B A B A ∩ B ∩ C A ∩ C B ∩ C The inclusion–exclusion principle for C three sets: Pr[ A ∪ B ∪ C ] = Pr[ A ] + Pr[ B ] + Pr[ C ] – Pr[ A ∩ B ] – Pr[ A ∩ C ] – Pr[ B ∩ C ] + Pr[ A ∩ B ∩ C ] IR&DM, WS'13/14 24 October 2013 II.2- 10
Some diagrams (2) • R.v. X that takes finite number of values partitions the sample space into finite sets (the pre-image of X ) – If X is a roll of dice, we have E 1 = { e ∈ Ω : X ( e ) = 1} = X –1 (1), and similarly for E 2 , E 3 , …, E 6 – If Y is indicator variable for “X ≥ 2”, we get 1 2 0 3 4 1 5 6 IR&DM, WS'13/14 24 October 2013 II.2- 11
Distributions • The cumulative distribution function ( cdf ) of r.v. X is a function F X : ℝ → [0, 1], F X ( x ) = Pr[ X ≤ x ] • If X is discrete, the probability mass function ( pmf ) of X is f X ( x ) = Pr[ X = x ] • If X is continuous, the probability density function ( pdf ) of X is a function f X for which – f X ( x ) ≥ 0 for all x R ∞ – −∞ f X ( x ) d x = 1 R x – We have that F X ( x ) = −∞ f X ( t ) d t IR&DM, WS'13/14 24 October 2013 II.2- 12
Example of a CDF and PDF 1 0,75 CDF: 0,5 0,25 -5 -4 -3 -2 -1 0 1 2 3 4 5 0,5 0,4 0,3 PDF: 0,2 0,1 -5 -4 -3 -2 -1 0 1 2 3 4 5 IR&DM, WS'13/14 24 October 2013 II.2- 13
Some discrete distributions • Uniform distribution over {1, 2, …, m } – Pr[ X = k ] = 1/ m for 1 ≤ k ≤ m • Bernoulli distribution with parameter p – Binary, single coin toss – Pr[ X = k ] = p k (1 – p ) 1 – k for k ∈ {0, 1} • Binomial distribution with parameters p and n – n repeated Bernoulli experiments with parameter p p k ( 1– p ) n − k ⇣ n ⌘ – for 0 ≤ k ≤ n Pr[ X = k ] = k • Geometric distribution with parameter p – Pr[ X = k ] = (1 – p ) k p for k ≥ 0 • Poisson distribution with rate parameter λ – Pr[ X = k ] = e − λ λ k / k ! IR&DM, WS'13/14 24 October 2013 II.2- 14
Some continuous distributions • Uniform distribution in the interval [ a , b ] 1 – for x ∈ [ a , b ] f X ( x ) = b − a • Exponential distribution with rate λ – Time between two events in a Poisson process – for x ≥ 0 f X ( x ) = λ e − λ x • t -distribution with ν degrees of freedom – Typical distribution for test statistics ⌘ − ν + 1 Γ ( ν + 1 2 ) 1 + x 2 ⇣ – 2 f X ( x ) = √ νπ Γ ( ν 2 ) ν • χ 2 distribution with k degrees of freedom k 2 − 1 e − x 1 – f X ( x ) = 2 k / 2 Γ ( k / 2 ) x 2 IR&DM, WS'13/14 24 October 2013 II.2- 15
Normal (Gaussian) distribution • Two parameters, µ (mean) and σ 2 (variance) 2 πσ 2 e − ( x − µ ) 2 – 1 f X ( x ) = 2 σ 2 √ • For standard normal distribution µ = 0 and σ 2 = 1 • Many, many applications 1 0,5 0,4 0,75 0,3 0,5 0,2 0,25 0,1 -5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5 • R.v. X is log-normally distributed if its logarithm is normally distributed IR&DM, WS'13/14 24 October 2013 II.2- 16
Multivariate distributions • If X and Y are two discrete variables, their joint mass function is f X,Y ( x , y ) = Pr[ X = x , Y = y ] – For continuous variables it is a non-negative function s.t. R R • f X , Y ( x , y ) = R f ( x , y ) d x d y = 1 R • for any A ∈ ℝ × ℝ , ! Pr[ ( X , Y ) ∈ A ] = A f X , Y ( x , y ) d x d y • The marginal distribution (mass function) for X is – for discrete X f X ( x ) = Pr[ X = x ] = P y f X , Y ( x , y ) – for continuous X R f X ( x ) = R f X , Y ( x , y ) d y • All these concepts extend naturally to more than two variables IR&DM, WS'13/14 24 October 2013 II.2- 17
Multivariate normal distribution • A.k.a. multidimensional Gaussian distribution • Two variables, vector µ and matrix Σ – For n variables, µ ∈ ℝ n and Σ ∈ ℝ n × n • The density function is n 1 2 ( x − µ ) T Σ − 1 ( x − mu ) o 1 f ( x ; µ , Σ ) = ( 2 π ) k / 2 | Σ | 1 / 2 exp • In the standard multivariate normal distribution, µ is all-zeros and Σ is the identity, giving n 1 o 1 2 x T x f ( x ) = ( 2 π ) k / 2 exp IR&DM, WS'13/14 24 October 2013 II.2- 18
Bivariate normal distribution IR&DM, WS'13/14 24 October 2013 II.2- 19
Independence, moments & Bayes’ • Two events A and B are independent if Pr[ A ∩ B ] = Pr[ A ]Pr[ B ] • Two r.v.’s X and Y are independent if f X,Y ( x, y ) = f X ( x ) f Y ( y ) for all x , y • The conditional probability of A given B is Pr[ A | B ] = Pr[ A ∩ B ]/Pr[ B ] – Assumes Pr[ B ] > 0 – If A and B are independent, Pr[ A | B ] = Pr[ A ] • The conditional pmf/pdf is f X | Y ( x | y ) = f X,Y ( x, y )/ f Y ( y ) – For independent X and Y , f X | Y ( x | y ) = f X ( x ) • A and B are conditionally independent given C if Pr[ A ∩ B | C ] = Pr[ A | C ]Pr[ B | C ] IR&DM, WS'13/14 24 October 2013 II.2- 20
Recommend
More recommend