Review of probability Nuno Vasconcelos UCSD
Probability • probability is the language to deal with processes that are non-deterministic • examples: – if I flip a coin 100 times, how many can I expect to see heads? – what is the weather going to be like tomorrow? – are my stocks going to be up or down? – am I in front of a classroom or is this just a picture of it?
Sample space • the most important concept is that of a sample space • our process defines a set of events – these are the outcomes or states of the process • example: – we roll a pair of dice – call the value on the up face at the n th toss x n – note that possible events such as x 2 � odd number on second throw 6 � two sixes � x 1 = 2 and x 2 = 6 – can all be expressed as combinations of the sample space events 1 x 1 1 6
Sample space • is the list of possible events that satisfies the following properties: x 2 – finest grain: all possible distinguishable 6 events are listed separately – mutually exclusive: if one event happens the other does not (if x 1 = 5 it cannot be anything else) – collectively exhaustive: any possible 1 outcome can be expressed as unions of x 1 1 6 sample space events • mutually exclusive property simplifies the calculation of the probability of complex events • collectively exhaustive means that there is no possible outcome to which we cannot assign a probability
Probability measure • probability of an event: – number expressing the chance that the event will be the outcome of the process • probability measure: satisfies three axioms – P(A) ≥ 0 for any event A x 2 – P(universal event) = 1 6 – if A ∩ B = ∅ , then P(A+B) = P(A) + P(B) • e.g. – P(x 1 ≥ 0) = 1 1 – P(x 1 even U x 1 odd) = P(x 1 even)+ P(x 1 odd) x 1 1 6
Probability measure • the last axiom – combined with the mutually exclusive property of the sample set – allows us to easily assign probabilities to all possible events • back to our dice example: – suppose that the probability of any pair x 2 (x 1 ,x 2 ) is 1/36 6 – we can compute probabilities of all “union” events – P(x 2 odd) = 18x1/36 = 1 – P(U) = 36x1/36 = 1 1 – P(two sixes) = 1/36 x 1 1 6 – P(x 1 = 2 and x 2 = 6) = 1/36
Probability measure • note that there are many ways to x 2 define the universal event U 6 – e.g. A = {x 2 odd}, B = {x 2 even}, U = A U B 1 – on the other hand x 1 1 6 U = (1,1) U (1,2) U (1,3) U … U (6,6) – the fact that the sample space is finest grain, exhaustive, and mutually exclusive and the measure axioms – make the whole procedure consistent
Random variables • random variable X – is a function that assigns a real value to each sample space event – we have already seen one such function: P X (x 1 ,x 2 ) = 1/36 for all (x 1 ,x 2 ) • notation: – specify both the random variable and the value that it takes in your probability statements – we do this by specifying the random variable as subscript P X and the value as argument P X (x 1 ,x 2 ) = 1/36 means Prob[X=(x 1 ,x 2 )] = 1/36 – without this, probability statements can be hopelessly confusing
Random variables • two types of random variables: – discrete and continuous – really means what types of values the RV can take • if it can take only one of a finite set of possibilities, we call it discrete – this is the dice example we saw, there are only 36 possibilities x 2 6 1 x 1 1 6
Random variables • if it can take values in a real interval we say that the random variable is continuous • e.g. consider the sample space of weather temperature – we know that it could be any number between -50 and 150 degrees – random variable T ∈ [-50,150] – note that the extremes do not have to be very precise, we can just say that P(T < -45 o ) = 0 • most probability notions apply equal well to discrete and continuous random variables
Discrete RV • for a discrete RV the probability assignments given by a probability mass function (PMF) – this can be thought of as a normalized histogram α – satisfies the following properties ≤ ≤ ∀ 0 ( ) 1 , P a a X ∑ = ( ) 1 P a X a • example for the random variable – X ∈ {1,2,3, …, 20} where X = i if the grade of student z on class is between 5i and 5(i+1) – we see that P X (14) = α
Continuous RV • for a continuous RV the probability assignments are given by a probability density function (PDF) – this is just a continuous function – satisfies the following properties ≤ ∀ 0 ( ) P a a X ∫ = ( ) 1 P a da X • example for the Gaussian random variable of mean µ and variance σ 2 ⎧ ⎫ − µ 2 1 ( ) a = − ⎨ ⎬ ( ) exp P X a σ π σ 2 ⎩ 2 ⎭ 2
Discrete vs continuous RVs • in general the same, up to replacing summations by integrals • note that PDF means “density of probability”, – this is probability per unit – the probability of a particular event is always zero (unless there is a discontinuity) – we can only talk about b ∫ ≤ ≤ = Pr( ) ( ) a X b P t dt X a – note also that PDFs are not upper bounded – e.g. Gaussian goes to Dirac when variance goes to zero
Multiple random variables • frequently we have problems with multiple random variables – e.g. when in the doctor, you are mostly a collection of random variables � x 1 : temperature � x 2 : blood pressure � x 3 : weight � x 4 : cough � … • we can summarize this as – a vector X = (x 1 , …, x n ) of n random variables – P X (x 1 , …, x n ) is the joint probability distribution
Marginalization ( cold ) ? P • important notion for multiple random variables is marginalization – e.g. having a cold does not depend on blood pressure and weight – all that matters are fever and cough – that is, we need to know P X1,X4 (a,b) • we marginalize with respect to a subset of variables – (in this case X 1 and X 4 ) – this is done by summing (or integrating) the others out ∑ = ( , ) ( , , , ) P x x P x x x x , 1 4 , , , 1 2 3 4 X X X X X X 1 4 1 2 3 4 , x x 3 4 ∫∫ = ( , ) ( , , , ) P x x P x x x x dx dx , 1 4 , , , 1 2 3 4 2 3 X X X X X X 1 4 1 2 3 4
Conditional probability ( | ) ? P X sick cough | Y • another very important notion: – so far, doctor has P X1,X4 (fever,cough) – still does not allow a diagnostic – for this we need a new variable Y with two states Y ∈ {sick, not sick} – doctor measures fever and cough levels, these are no longer unknowns, or even random quantities – the question of interest is “what is the probability that patient is sick given the measured values of fever and cough?” • this is exactly the definition of conditional probability – what is the probability that Y takes a given value given observations for X ( | 98 , ) P sick high | 1 , Y X X 4
Conditional probability • note the very important difference between conditional and joint probability • joint probability is an hypothetical question with respect to all variables – what is the probability that you will be sick and cough a lot? ( , ) ? P X sick cough , Y
Conditional probability • conditional probability means that you know the values of some variables – what is the probability that you are sick given that you cough a lot? ( | ) ? P X sick cough | Y – “given” is the key word here – conditional probability is very important because it allows us to structure our thinking – shows up again and again in design of intelligent systems
Conditional probability • fortunately it is easy to compute – we simply normalize the joint by the probability of what we know ( , 98 ) P sick = , Y X ( | 98 ) P sick 1 | Y X ( 98 ) 1 P X 1 – note that this makes sense since + = ( | 98 ) ( | 98 ) 1 P sick P not sick | | Y X Y X 1 1 – and, by the marginalization equation, + = ( , 98 ) ( , 98 ) ( 98 ) P sick P not sick P , , Y X Y X X 1 1 1 – the definition of conditional probability � just makes these two statements coherent � simply says that, given what we know, we still have a valid probability measure � universal event {sick} U {not sick} still probability 1 after observation
The chain rule of probability • is an important consequence of the definition of conditional probability – note that, from this definition, = ( , ) ( | ) ( ) P y x P y x P x , 1 | 1 1 Y X Y X X 1 1 1 – more generally, it has the form = × ( , ,..., ) ( | ,..., ) P x x x P x x x , ,..., 1 2 | ,..., 1 2 X X X n X X X n 1 2 1 2 n n × × ( | ,..., ) ... P x x x | ..., 2 3 X X X n 2 3 n × × ... ( | ) ( ) P x x P x − | 1 X X n n X n − 1 n n n • combination with marginalization allows us to make hard probability questions simple
The chain rule of probability • e.g. what is the probability that you will be sick and have 104 o of fever? = ( , 104 ) ( | 104 ) ( 104 ) P sick P sick P , | Y X Y X X 1 1 1 – breaks down a hard question (prob of sick and 104) into two easier questions – Prob (sick|104): everyone knows that this is close to one You have a cold! = ( | 104 ) 1 ! P X sick | Y
Recommend
More recommend