probability theory as extended logic probability theory
play

Probability Theory as Extended Logic: Probability Theory as Extended - PDF document

Probability Theory as Extended Logic: Probability Theory as Extended Logic: A short introduction into quantitative reasoning with A short introduction into quantitative reasoning with incomplete information incomplete information


  1. Probability Theory as Extended Logic: Probability Theory as Extended Logic: A short introduction into quantitative reasoning with A short introduction into quantitative reasoning with incomplete information incomplete information • Axiomatic derivation of probability theory. • Bayes’ theorem and posterior probalities vs. p- values and confidence intervals • Model selection: inferring dependence between variables. • Prior probabilities: symmetry transformations and the maximum entropy principle. • Stochastic processes: generating functions and the central limit theorem. Erik van Nimwegen Division of Bioinformatics Biozentrum, Universität Basel, Swiss Institute of Bioinformatics Probability Theory as Extended Logic: Probability Theory as Extended Logic: A short introduction into quantitative reasoning with A short introduction into quantitative reasoning with incomplete information incomplete information I cannot conceal the fact here that in the specific application of these rules, I foresee many things happening which can cause one to be badly mistaken if he does not proceed cautiously. Jacob Bernoulli, Ars Conjectandi, Basel 1705

  2. Probability Theory as Extended Logic. Probability Theory as Extended Logic. E.T. Jaynes in 1982 Almost everything in this lecture can be found in this book. Jaynes left the book unfinished when he died in 1998. The unfinished version was available on the internet for many years (it still is). It was edited by a former student and was finally published in 2003. E. van Nimwegen, EMBnet Basel, March 2006 From logic to extended logic From logic to extended logic Aristotelian logic is a calculus of propositions. It tells us how to deduce the truth or falsity of certain statements from the truth of falsity of other statements. Assume: If A is true then B is true. Or in symbols: B|A B is true. (B|A)(A) = (B)(A) A is true (B|A)(~B) = (~A)(~B) B is false A is false. But in reality it is almost always necessary to reason like this: A becomes more plausible B is true A is false B becomes less plausible Or even: If A is true than B becomes more plausible A becomes more plausible B is true E. van Nimwegen, EMBnet Basel, March 2006

  3. From logic to extended logic From logic to extended logic R.T. Cox (1946): 1. Plausibilities are represented by real numbers and depend on the information we have, i.e. P ( x | I ) the plausibility of x given our information I . 2. Plausibilities should match common sense: They should reduce to logic for statements that we know to be true or false and should go up and down in accordance with common sense. 3. Consistency: If a plausibility can be derived in multiple ways, all ways should give the same answer. The solution is unique and matches probability theory a la Laplace. The two quantitative rules The two quantitative rules ¬ + = (1) ( | ) ( | ) 1 P A I P A I A certainly true statement has probability 1, a false statement has probability 0. The probability that the statement is true determines the probability that the statement is false. = = ( | ) ( | ) ( | ) ( | ) ( | ) (2) P AB I P A BI P B I P B AI P A I The probability of A and B given the information I can be written as either The probability of B given I times the probabiltiy of A given B and I, or As the probability of A given I times the probability of B given A and I. Example: The probability that there is liquid water and life on mars is the probability that there is liquid water times the probability of life given liquid water or the probability of life times the probability of liquid water given life.

  4. Assigning probabilities using symmetry Assigning probabilities using symmetry • Assume n mutually exclusive and exhaustive hypotheses A i n n ∑ ∑ = ∀ ≠ = ( | ) 0 , P A A I i j ( | ) 1 P A i I i j = = 1 i 1 i • Assume you know nothing else. = 1 ∀ ( | ) , P A I i i n • Consistency now demands that: Proof: • Any relabelling of our hypotheses changes our problem into an equivalent problem. That is, the same information I applies to all. • When the supplied information I is the same the assignment of probabilities has to be the same. • Unless all P( A i | I ) are equal this will be violated. Contrast with ‘ ‘frequency frequency’ ’ interpretation interpretation Contrast with of probabilities of probabilities • In orthodox probability theory a probability is associated with a random variable and records the physical tendency for something to happen in repeated trials. Example: The probability of “a coin coming up heads when thrown” is a feature of the coin and can be determined by repeated experiment.

  5. Contrast with ‘ ‘frequency frequency’ ’ interpretation interpretation Contrast with of probabilities of probabilities • In standard probability theory a probability is associated with a random variable and records the physical tendency for something to happen in repeated trials. Example: The probability of “a coin coming up heads when thrown” is a feature of the coin and can be determined by repeated experiment. • Quote from William Feller ( an Introduction to Probability Theory and its Applications 1950) : The number of possible distributions of cards in Bridge is almost 10 30 . Usually we agree to consider them as equally probable. For a check of this convention more than 10 30 experiments would be required. Contrast with ‘ ‘frequency frequency’ ’ interpretation interpretation Contrast with of probabilities of probabilities • In standard probability theory a probability is associated with a random variable and records the physical tendency for something to happen in repeated trials. Example: The probability of “a coin coming up heads when thrown” is a feature of the coin and can be determined by repeated experiment. • Quote from William Feller ( an Introduction to Probability Theory and its Applications 1950) : The number of possible distributions of cards in Bridge is almost 10 30 . Usually we agree to consider them as equally probable. For a check of this convention more than 10 30 experiments would be required. • Is this really how anyone reasons? Example: Say that I tell you that I went to the store, bought a normal deck of cards, and dealt 1000 Bridge hands, making sure to shuffle well between every two deals. I found that the king and queen of hearts were always in the same hand. What would you think?

  6. Assessing the evolutionary divergence Assessing the evolutionary divergence of two genomes of two genomes A reference genome has G genes: G • A different strain of the species is isolated and we want to estimate what number g of its genes is mutated with respect to the reference genome. ? • To estimate this we sequence one gene at a time from the new strain and compare it with the reference genome. Wildtype: Mutant: Assessing the evolutionary divergence Assessing the evolutionary divergence of two genomes of two genomes G After sequencing ( m+w ) genes we have m mutants and w wildtypes What do we now know about the number g of all genes that are mutants? Formalizing our information: • We have no information if the two genomes are closely or distantly related so a priori g = G is as likely as g = 0 or any other value. • Assuming the number of mutants g given, there is no information about which of the G genes are the mutants.

  7. Assessing the evolutionary divergence Assessing the evolutionary divergence of two genomes of two genomes Formalizing our information: ( | ) • Prior probability that g genes are mutant given our information: P g I 1 = G ( | ) P g I + 1 • Assuming g mutants, the probability that the first sequenced gene will be a mutant or wildtype: − g G g ( μ = = | ) , ( | ) P g P wt g G G • The probabilities for the first two sequenced genes are: − − ( 1 ) ( ) g g g G g μ μ = μ = ( , | ) , ( , | ) , P g P wt g − − ( 1 ) ( 1 ) G G G G and so on − − − − ( ) ( )( 1 ) G g g G g G g μ = = ( , ) , ( , ) P wt P wt wt − − ( 1 ) ( 1 ) G G G G Assessing the evolutionary divergence Assessing the evolutionary divergence of two genomes of two genomes Generally, the probability for a particular series of mutant/wildtype observations containing m mutants and w wildtype is given by: − − + − − + − − + L L ( 1 ) ( 1 )( )( 1 ) ( 1 ) g g g m G g G g G g w = ( , | ) P m w g − − − + ( 1 ) L ( 1 ) G G G m w − − − ! ( )! ( )! g G g G m w or = ( , | ) P m w g − − − ( )! ( )! ! g m G g w G ( | ) We now know the prior probability that a certain number of genes P g I are mutants. ( , | ) We know the likelihood P m w g to observe a given string of observations given g. We want to know the posterior probability of g given m and w . ( | , ) P g m w

Recommend


More recommend