basic statistics
play

Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: - PowerPoint PPT Presentation

Carnegie Mellon University 10-701 Machine Learning Spring 2013 TA: Ina Fiterau Alex Smola Barnabas Poczos 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1


  1. Carnegie Mellon University 10-701 Machine Learning Spring 2013 TA: Ina Fiterau Alex Smola Barnabas Poczos 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1

  2. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Overview Introduction to Probability Theory Random Variables. Independent RVs Properties of Common Distributions Estimators. Unbiased estimators. Risk Conditional Probabilities/Independence Bayes Rule and Probabilistic Inference 1/25/2013 Recitation 1: Statistics Intro 2

  3. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Review: the concept of probability Sample space Ω – set of all possible outcomes Event E ∈ Ω – a subset of the sample space Probability measure – maps Ω to unit interval “How likely is that event E will occur?” Kolmogorov axioms Ω P E ≥ 0 P Ω = 1 𝐹 ∞ = P 𝐹 1 ∪ 𝐹 2 ∪ ⋯ 𝑄(𝐹 𝑗 ) 𝑗=1 1/25/2013 Introduction to Probability Theory 3

  4. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Reasoning with events Venn Diagrams 𝑄 𝐵 = 𝑊𝑝𝑚(𝐵)/𝑊𝑝𝑚 (Ω) Event union and intersection 𝑄 𝐵 𝐶 = 𝑄 𝐵 + 𝑄 𝐶 − 𝑄 𝐵 ∩ 𝐶 Properties of event union/intersection Commutativity: 𝐵 ∪ 𝐶 = 𝐶 ∪ 𝐵 ; 𝐵 ∩ 𝐶 = 𝐶 ∩ 𝐵 Associativity: 𝐵 ∪ 𝐶 ∪ C = (𝐵 ∪ 𝐶) ∪ C Distributivity: 𝐵 ∩ 𝐶 ∪ 𝐷 = (𝐵 ∩ 𝐶) ∪ (𝐵 ∩ 𝐷) 1/25/2013 Introduction to Probability Theory 4

  5. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Reasoning with events DeMorgan’s Laws (𝐵 ∪ 𝐶) 𝐷 = 𝐵 𝐷 ∩ 𝐶 𝐷 (𝐵 ∩ 𝐶) 𝐷 = 𝐵 𝐷 ∪ 𝐶 𝐷 Proof for law #1 - by double containment (𝐵 ∪ 𝐶) 𝐷 ⊆ 𝐵 𝐷 ∩ 𝐶 𝐷 • … 𝐵 𝐷 ∩ 𝐶 𝐷 ⊆ (𝐵 ∪ 𝐶) 𝐷 • … 1/25/2013 Introduction to Probability Theory 5

  6. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Reasoning with events Disjoint (mutually exclusive) events 𝑄 𝐵 ∩ 𝐶 = 0 𝑄 𝐵 ∪ 𝐶 = 𝑄 𝐵 + 𝑄(𝐶) examples: 𝑇 1 𝑇 5 𝑇 4 • 𝐵 and 𝐵 𝐷 𝑇 3 𝑇 6 𝑇 2 • partitions NOT the same as independent events For instance, successive coin flips 1/25/2013 Introduction to Probability Theory 6

  7. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Partitions Partition 𝑇 1 … 𝑇 𝑜 Events cover sample space 𝑇 1 ∪ ⋯ ∪ 𝑇 𝑜 = Ω Events are pairwise disjoint 𝑇 𝑗 ∩ 𝑇 𝑘 = ∅ Event reconstruction 𝑜 𝑄 𝐵 = 𝑄(𝐵 ∩ 𝑇 𝑗 ) 𝑗=1 Boole’s inequality ∞ 𝑜 ≤ 𝑄 𝐵 𝑗 𝑄(𝐵 𝑗 ) 𝑗=1 𝑗=1 Bayes’ Rule 𝑄 𝐵 𝑇 𝑗 𝑄(𝑇 𝑗 ) 𝑄 𝑇 𝑗 |𝐵 = 𝑜 𝑄 𝐵 𝑇 𝑘 𝑄(𝑇 𝑘 ) 𝑘=1 1/25/2013 Introduction to Probability Theory 7

  8. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Overview Introduction to Probability Theory Random Variables. Independent RVs Properties of Common Distributions Estimators. Unbiased estimators. Risk Conditional Probabilities/Independence Bayes Rule and Probabilistic Inference 1/25/2013 Recitation 1: Statistics Intro 8

  9. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Random Variables Random variable – associates a value to the outcome of a randomized event Sample space 𝒴 : possible values of rv 𝑌 Example: event to random variable Draw 2 numbers between 1 and 4. Let r.v. X be their sum. E 11 12 13 14 21 22 23 24 31 32 33 34 41 42 43 44 X(E) 2 3 4 5 3 4 5 6 4 5 6 7 5 6 7 8 Induced probability function on 𝒴 . x 2 3 4 5 6 7 8 1 2 3 4 3 2 1 P(X=x) 16 16 16 16 16 16 16 1/25/2013 Random Variables 9

  10. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Cumulative Distribution Functions 𝐺 𝑌 𝑦 = 𝑄 𝑌 ≤ 𝑦 ∀𝑦 ∈ 𝒴 The CDF completely determines the probability distribution of an RV The function 𝐺 𝑦 is a CDF i.i.f 𝑦→−∞ 𝐺 𝑦 = 0 and lim lim 𝑦→∞ 𝐺 𝑦 = 1 𝐺 𝑦 is a non-decreasing function of 𝑦 𝐺 𝑦 is right continuous: ∀𝑦 0 lim 𝐺 𝑦 = 𝐺(𝑦 0 ) 𝑦→𝑦 0 𝑦 > 𝑦 0 1/25/2013 Random Variables 10

  11. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Identically distributed RVs Two random variables 𝑌 1 and 𝑌 2 are identically distributed iif for all sets of values 𝐵 𝑄 𝑌 1 ∈ 𝐵 = 𝑄 𝑌 2 ∈ 𝐵 So that means the variables are equal? NO. Example: Let’s toss a coin 3 times and let 𝑌 𝐼 and 𝑌 𝐺 represent the number of heads/tails respectively They have the same distribution but 𝑌 𝐼 = 1 − 𝑌 𝐺 1/25/2013 Random Variables 11

  12. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Discrete vs. Continuous RVs Step CDF Continuous CDF 𝒴 is discrete 𝒴 is continuous Probability mass Probability density 𝑦 𝑔 𝑌 𝑦 = 𝑄 𝑌 = 𝑦 ∀𝑦 𝐺 𝑌 𝑦 = 𝑔 𝑌 𝑢 𝑒𝑢 ∀𝑦 −∞ 1/25/2013 Random Variables 12

  13. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Interval Probabilities Obtained by integrating the area under the curve 𝑄 𝑦 1 ≤ 𝑌 ≤ 𝑦 2 = 𝑦 2 𝑔 𝑦 𝑦 𝑒𝑦 𝑦 1 𝑦 2 𝑦 1 This explains why P(X=x) = 0 for continuous distributions! 𝑄 𝑌 = 𝑦 ≤ lim [𝐺 𝑦 𝑦 − 𝐺 𝑦 (𝑦 − 𝜗)] = 0 𝜗→0 𝜗 >0 1/25/2013 Random Variables 13

  14. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Moments Expectations The expected value of a function 𝑕 depending on a r.v. X~𝑄 is defined as 𝐹𝑕 𝑌 = 𝑕(𝑦)𝑄 𝑦 𝑒𝑦 n th moment of a probability distribution 𝜈 𝑜 = 𝑦 𝑜 𝑄 𝑦 𝑒𝑦 mean 𝜈 = 𝜈 1 n th central moment 𝜈 𝑜 ′ = 𝑦 − 𝜈 𝑜 𝑄 𝑦 𝑒𝑦 Variance 𝜏 2 = 𝜈 2 ′ 1/25/2013 Random Variables 14

  15. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Multivariate Distributions Example Uniformly draw 𝑌 and 𝑍 from the set {1,2,3} 2 𝑋 = 𝑌 + 𝑍 ; 𝑊 = |𝑌 − 𝑍| W V 0 1 2 P W 2 1/9 0 0 1/9 Joint 3 0 2/9 0 2/9 4 1/9 0 2/9 3/9 𝑌, 𝑍 ∈ 𝐵 = 𝑄 𝑔(𝑦, 𝑧) (𝑦,𝑧)𝜗𝐵 5 0 2/9 0 2/9 6 1/9 0 0 1/9 Marginal P V 3/9 4/9 2/9 1 𝑍 𝑧 = 𝑔(𝑦, 𝑧) 𝑔 𝑦 For independent RVs: 𝑔 𝑦 1 , … , 𝑦 𝑜 = 𝑔 𝑌 1 𝑦 1 … 𝑔 𝑌 𝑜 (𝑦 𝑜 ) 1/25/2013 Random Variables 15

  16. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Overview Introduction to Probability Theory Random Variables. Independent RVs Properties of Common Distributions Estimators. Unbiased estimators. Risk Conditional Probabilities/Independence Bayes Rule and Probabilistic Inference 1/25/2013 Recitation 1: Statistics Intro 16

  17. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Bernoulli 𝑌 = 1 𝑥𝑗𝑢ℎ 𝑞𝑠𝑝𝑐𝑏𝑐𝑗𝑚𝑗𝑢𝑧 𝑞 𝑥𝑗𝑢ℎ 𝑞𝑠𝑝𝑐𝑏𝑐𝑗𝑚𝑗𝑢𝑧 1 − 𝑞 0 ≤ 𝑞 ≤ 1 0 Mean and Variance 𝐹𝑌 = 1𝑞 + 0 1 − 𝑞 = 𝑞 𝑊𝑏𝑠𝑌 = 1 − 𝑞 2 𝑞 + 0 − 𝑞 2 1 − 𝑞 = 𝑞(1 − 𝑞) MLE: sample mean Connections to other distributions: 𝑜 If 𝑌 1 … 𝑌 𝑜 ~ 𝐶𝑓𝑠𝑜(𝑞) then Y = 𝑌 𝑗 is Binomial(n, p) 𝑗=1 Geometric distribution – the number of Bernoulli trials needed to get one success 1/25/2013 Properties of Common Distributions 17

  18. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Binomial 𝑄 𝑌 = 𝑦; 𝑜, 𝑞 = 𝑜 𝑦 𝑞 𝑦 (1 − 𝑞) 𝑜−𝑦 Mean and Variance 𝑦 𝑜 𝑜 𝑦 𝑞 𝑦 (1 − 𝑞) 𝑜−𝑦 𝐹𝑌 = = … = 𝑜𝑞 𝑦=0 𝑊𝑏𝑠𝑌 = 𝑜𝑞(1 − 𝑞) NOTE: 𝑾𝒃𝒔𝒀 = 𝑭𝒀 𝟑 − (𝑭𝒀) 𝟑 Sum of Bin is Bin Conditionals on Bin are Bin 1/25/2013 Properties of Common Distributions 18

  19. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Properties of the Normal Distribution Operations on normally-distributed variables 𝑌 1 , 𝑌 2 ~ 𝑂𝑝𝑠𝑛 0,1 , then 𝑌 1 ± 𝑌 2 ~𝑂(0,2) 𝑌 1 / 𝑌 2 ~ 𝐷𝑏𝑣𝑑ℎ𝑧(0,1) 𝑌 1 ~ 𝑂𝑝𝑠𝑛 𝜈 1 , 𝜏 12 , 𝑌 2 ~ 𝑂𝑝𝑠𝑛 𝜈 2 , 𝜏 22 and 𝑌 1 ⊥ 𝑌 2 then 𝑎 = 𝑌 1 + 𝑌 2 ~ 𝑂𝑝𝑠𝑛 𝜈 1 + 𝜈 2 , 𝜏 12 + 𝜏 22 𝜈 𝑦 𝜏 𝑌2 𝜍𝜏 𝑌 𝜏 𝑍 If 𝑌 , 𝑍 ~ 𝑂 𝜈 𝑧 , , then 𝜏 𝑍2 𝜍𝜏 𝑌 𝜏 𝑍 𝑌 + 𝑍 is still normally distributed, the mean is the sum of the means and the variance is 𝜏 𝑌+𝑍2 = 𝜏 𝑌2 + 𝜏 𝑍2 + 2𝜍𝜏 𝑌 𝜏 𝑍 , where 𝜍 is the correlation 1/25/2013 Properties of Common Distributions 19

  20. Carnegie Mellon University 10-701 Machine Learning Spring 2013 Overview Introduction to Probability Theory Random Variables. Independent RVs Properties of Common Distributions Estimators. Unbiased estimators. Risk Conditional Probabilities/Independence Bayes Rule and Probabilistic Inference 1/25/2013 Recitation 1: Statistics Intro 20

Recommend


More recommend