basic concepts
play

Basic Concepts G. Urvoy-Keller urvoy@unice.fr Probabilty and - PowerPoint PPT Presentation

Basic Concepts G. Urvoy-Keller urvoy@unice.fr Probabilty and Statistics Outline Basic concepts Probability Conditional Probability Moments Common Distributions Binomial Zipf Poisson Uniform


  1. Basic Concepts G. Urvoy-Keller urvoy@unice.fr Probabilty and Statistics

  2. Outline Basic concepts � Probability � Conditional Probability � Moments � Common Distributions � Binomial � Zipf � Poisson � Uniform � Normal � Beta � Gamma � 2

  3. Basic Concepts A random experiment is an experiment whose outcome � cannot be predicted with certainty The sample space is the set of all possible outcomes from an � experiment The outcomes from random experiments are called random � variables and often represented as uppercase variables (e.g. X) Random variables can be discrete or continuous � An event is a subset of outcomes in the sample space � Mutually exclusive events: 2 events that cannot occur � altogether Extension: n events that taken in every possible pairs are � 3 mutually exclusive

  4. Probability Probability is the measure of the likelihood that some event will occur � Historically, there are two ways of computing probabilities � Equal likelihood model (classical theory): � For an event E we count (no experiment) the number n of favorable outcomes � We also know the total number of possible outcomes N � We then set P=n/N � We thus assume that all outcomes are equally likely • Works well for coin and die tossing, cards. � Relative frequency methods: � Can be used when all outcomes are not equally likely � “Active method” where the experiment is carried out n times � If the event E occurred f times, then P=f/n � Modern theory of probabilities is based on axiomatic theory � The probability of an event is computed based on: � Probability density function in the case of a continuous random variable � Probability mass function in the case of a discrete random variable � Common convention: use density (or pdf) for discrete and continuous rv � 4

  5. Probability in the case of a continuous random variable Let f(x)=P(x<X<x+dx)/dx be the probability density function � (pdf) b P ( a ≤ X ≤ b )= ∫ f ( x ) dx a f(x) x) 5 x

  6. Probability in the case of a discrete random variable Let f(x) be the probability mass function (pmf) � b P ( a ≤ X ≤ b )= ∑ f ( x ) a f(x) x) b a 6

  7. Cumulative Distribution Function The cdf F(x) is the probability that the random variable X is � less than or equal to x: x F ( x )= ∫ f ( u ) du ( continuous case ) −∞ Cdf dfs s conver converge ge to o 1 F ( x )= ∑ f ( x i ) ( discrete case ) x i ≤ x 7

  8. Axioms of Probability Let S be the sample space and E be an event (i.e., subset of S) � Axiom 1: The probability of event E must be between 0 and 1: � 0≤P(E)≤1 Axiom 2: � P(S)=1 Axiom 3: for mutually exclusive events E 1 ,E 2 ,…,E n � n P ( E 1 ∪ E 2 ∪ ... ∪ E n )= ∑ P ( E i ) 1 8

  9. Axioms of Probability Axiom 1 states that a probability must be between 0 and 1. � This means that pdf and pmf must be positive and sum to 1 Axiom 2 says that an outcome must occur and the sample � space cover all possible outcomes Axiom 3 enables to compute the probability that at least one � of the mutually exclusive events occur by summing their individual probabilities. 9

  10. Conditional Probability and Independence The conditional probability of event E given event F is defined � as: P ( E ∣ F )= P ( E ∩ F ) P ( F ) P(E∩F) represents the probability that E and F occur together � P(F) appears as a “re-normalization” factor � Example: for mutually exclusive events E and F, P(E∩F) =0 � and thus P(E|F)=0. The latter denotes a very strong dependence between the two events! 10

  11. Conditional Probability and Independence Independence: two events E and F are said to be independent � if: P ( E ∣ F ) =P ( E ) which is equivalent to: P ( E ∩ F ) =P ( E ) P ( F ) Definition for the case of n events: E 1 ,…E n are said to be � independent if any subset E (1) , E (2) ,.. E (k) , is independent P ( E ( 1 ) ∩ E ( 2 ) ... ∩ E ( k ) ) =P ( E ( 1 ) )× P ( E ( 2 ) )× .... × P ( E ( k ) ) Independence is not transitive!!!! � If E1 is independent from E2 and E2 from E3, E1 might depend on E3 Independence is reflexive: if E is independent from F, F is � independent from E since P ( F ∣ E ) =P ( E ∣ F ) P ( F ) 11 P ( E )

  12. Conditional Probability- Illustration It has been demonstrated that there was a lot free riders in � Gnutella networks. Free-riders: clients that retrieve documents but do not provide � any data to other peers. A natural question that may arise when studying such � systems is: “How many files does a client share with its peers?” Due to free-riding, you will find very low figures. It is thus � better to split the above question into two sub-questions: What is the probability that a client is a free-rider? � What is the probability that a non free-rider shares n files? � 12

  13. Conditional Probability- Illustration Let: � Q be the random variable that denotes the number of files offered � by a client S be the random variable that denotes the type of client � F: free-rider � Non-F: not a free rider � The previous questions can be formulated as follows: � P ( S = F ) P ( Q=n ∣ S=non − F ) 13

  14. Independence - Illustration A die is tossed twice. Consider the following events: � A: the first toss gives an odd number � B: the second toss gives an odd number � C: the sum of the two tosses is an odd number � Any pair of the previous events are independent. Indeed � P(A)=P(B)=P(C)=1/2 � P(A∩B)=P(A∩C)=P(B∩C)=1/4 � Since to obtain an odd number, you need one odd number � Still, P(A∩B∩C)=0. Hence (A,B,C) are not independent � 14

  15. Total Probability Theorem Theorem: Let E 1 ,E 2 ,…E n be n mutually exclusive events such � that U i E i =S (S is the sample space) and P(E i ) ≠ 0. Let B be an event. Then: n P ( B )= ∑ P ( B ∣ E i ) P ( E i ) i= 1 Proof: � 15

  16. Bayes Theorem Bayes theorem allows to estimate a “posteriori” probabilities � from “a priori” probabilities. Consider the following problem: one wants to evaluate the � efficiency of a test for a disease. Let: A= event that the test states that the person is infected � B=event that the person is infected � A c =event that the test states that the person is not infected � B c =event that the person is not infected � Suppose we have the following a-priori information: � P(A|B)=P(A c |B c )=0.95 - obtained from tests on well defined � populations P(B)=0.005 � A good measure of the efficiency of the test is the “a � posteriori” probability P(B|A) 16

  17. Bayes Theorem Theorem: given a event F and a set of mutually exclusive � events E 1 ,E 2 ,…E n whose union makes up the entire sample space: P ( E i ) P ( F ∣ E i ) P ( E i ∣ F )= n ∑ P ( F ∣ E k ) P ( E k ) k= 1 A post poster erio o inf nfor ormat ation on A pr prior ori inf nfor ormat ation on Derivation of the theorem is straightforward using the � definition of conditional probabilities 17

  18. Bayes Theorem Applied to the “disease test” problem stated before, we � obtain: P ( B ∣ A )= P ( B ) P ( A ∣ B ) P ( B ) P ( A ∣ B ) +P ( B c ) P ( A ∣ B c ) 0.005 × 0.95 0.005 × 0.95 + 0.995 ( 1 − 0.95 ) = 0.087 Thus, when the test is positive, the person is in fact infected in � only 8.7% of the cases! Very bad!!! Conclusion: even if “a priori” tests were correct for 95% of the � cases, this was not enough due to the scarcity of the disease For example, with P(B)=0.1 and , we would have obtained: � 18 P(B|A)=68% (not that good either…)

  19. Mean and Variance The mean or average value E[X]= µ of a distribution provides a � measure of the tendency of a distribution. +∞ E [ X ]= ∫ xf ( x ) dx ( continuous case ) −∞ +∞ ∑ x i f ( x i ) ( discrete case ) i= 1 The variance V(X)= σ 2 of a random variable (r.v.) X measures the average dispersion around the mean μ +∞ 2 f ( x ) dx ( continuous case ) 2 ]= ∫ 2 =V ( X ) =E [( X − μ ) σ ( x − μ ) −∞ +∞ ( x i − μ ) 2 f ( x i ) ( dis crete case ) ∑ i= 1 19

  20. Mean and Variance E[] is a linear � function E [ αX ] =αE [ X ] α is a scalar � E [ X+Y ] =E [ X ] +E [ Y ] X,Y r.v. � Practical formula: � V ( X ) =E [( X − μ ) 2 ] =E [ X 2 − 2μX +μ 2 ] E [ X 2 ]− 2μE [ X ] +μ 2 E [ X 2 ]− μ 2 20

  21. Coefficient of Variation σ= √ V ( X ) is called the standard deviation of the r.v. X � C= σ is called the coefficient of variation of the r.v. X � μ Interpretation: � “C measures the level of divergence of X with respect to its mean” � or “C measures the variation of X in units of its mean” � C allows to compare two distributions with different means � C is independent of the chosen unity � 21

  22. Coefficient of Variation To illustrate C, let us consider two sets of values drawn from normal distributions (defined later): Distribution 1 with µ =1, σ =10 => => C=10 Distribution 2 with μ=100,σ=10 => C=0.1 Looking at the pdfs, you might miss how values can be close or Set et 1 Set et 2 far away from the means: -11 11 106. 106.2 -9. 9.6 108 108 16 16 109. 109.4 1. 1.6 90. 90.08 08 -11 11 102. 102.1 0. 0.59 59 102.4 102. -10 10 89. 89.92 92 -12 12 92.58 92. 58 -1. 1.6 110.8 110. 22 11 11 98.69 98. 69

Recommend


More recommend