bayesians can learn from old data
play

Bayesians Can Learn From Old Data William H. Jefferys University of - PowerPoint PPT Presentation

General Overview Glymours Argument Summary and Conclusions Bayesians Can Learn From Old Data William H. Jefferys University of Texas at Austin University of Vermont 27th International Workshop on Bayesian Inference and Maximum Entropy


  1. General Overview Glymour’s Argument Summary and Conclusions Bayesians Can Learn From Old Data William H. Jefferys University of Texas at Austin University of Vermont 27th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, 2007 Jefferys Old Data

  2. General Overview Glymour’s Argument Summary and Conclusions Abstract In a paper that has been widely-cited within the philosophy of science community, Glymour claims to show that Bayesians cannot learn from old data. His argument contains elementary errors, ones which E. T. Jaynes and others have often warned about. I explain exactly where Glymour went wrong, and how to handle the problem correctly. When the problem is fixed, it is seen that Bayesians, just like logicians, can indeed learn from old data. Jefferys Old Data

  3. General Overview Glymour’s Argument Summary and Conclusions Outline General Overview 1 Standard Logic Probability and Logic A Toy Example Glymour’s Argument 2 Counterexample to Glymour’s Argument Glymour’s Friend Where Glymour Went Wrong Summary and Conclusions 3 Jefferys Old Data

  4. General Overview Standard Logic Glymour’s Argument Probability and Logic Summary and Conclusions A Toy Example Logic is Fundamental Combine propositions A , B , ... having truth-values with logical operations ∧ , ∨ , → , ¬ to calculate truth-values of the combined propositions Time-independent: Results are independent of when we learn the truth-values Jefferys Old Data

  5. General Overview Standard Logic Glymour’s Argument Probability and Logic Summary and Conclusions A Toy Example Cox’s Theorem Probability theory extends logic to degrees of plausibility on [ 0 , 1 ] Probability theory is the unique extension of logic to degrees of plausibility, given some obvious requirements (Cox) Jefferys Old Data

  6. General Overview Standard Logic Glymour’s Argument Probability and Logic Summary and Conclusions A Toy Example Jaynes’ Desiderata Jaynes proposes three desiderata that must be satisfied by any reasonable extension of logic to a theory of plausibility: If a conclusion can be reasoned out in more than one way, 1 then every possible way must lead to the same result. The calculation takes into account all of the evidence 2 relevant to the question. It does not arbitrarily ignore some of the information, basing its conclusion only on what remains. It is “completely nonideological” Equivalent states of knowledge are always represented by 3 equivalent plausibility assignments. Jefferys Old Data

  7. General Overview Standard Logic Glymour’s Argument Probability and Logic Summary and Conclusions A Toy Example A Toy Example. Two theories, T and ¬ T = T Two possible observations, E and ¬ E = E T → E T → E Logic is time-independent; these relations don’t depend on when we learn the truth or falsity of E Jefferys Old Data

  8. General Overview Standard Logic Glymour’s Argument Probability and Logic Summary and Conclusions A Toy Example Example: Relativity vs. Newtonian Physics T is general relativity. It predicts that we will observe E = anomalous perihelion motion of Mercury. T is Newtonian physics. It predicts that we will observe E = no anomalous perihelion motion. Assume we observe E or E with 100% certainty. Then T → E T → E Jefferys Old Data

  9. General Overview Standard Logic Glymour’s Argument Probability and Logic Summary and Conclusions A Toy Example We Can Learn From Logic and Knowledge of E We know E is true. From logic we conclude: T → E Therefore, E → T Therefore, T and ¬ T Jefferys Old Data

  10. General Overview Standard Logic Glymour’s Argument Probability and Logic Summary and Conclusions A Toy Example The Likelihood Logic translates into probability statements T → E yields P ( E | T ) = 1 P ( E | T ) = 0 T → E yields P ( E | T ) = 0 P ( E | T ) = 1 Up to a common factor, these four equations are the likelihood function for the two cases of observing E and E . Jefferys Old Data

  11. General Overview Counterexample to Glymour’s Argument Glymour’s Argument Glymour’s Friend Summary and Conclusions Where Glymour Went Wrong The Argument Know that E is true Therefore, P ( E ) = 1 ??? Therefore, P ( E | T ) = 1 Therefore P ( T | E ) = P ( E | T ) P ( T ) = P ( T ) P ( E ) Since P ( T | E ) = P ( T ) , we haven’t learned anything Jefferys Old Data

  12. General Overview Counterexample to Glymour’s Argument Glymour’s Argument Glymour’s Friend Summary and Conclusions Where Glymour Went Wrong Violates Desideratum #1 and Cox’s Theorem Glymour’s argument says that unless P ( T ) = 1, we can’t conclude that P ( T | E ) = 1, i.e., that T | E is true But logic tells us that T | E is true, period Cox’s Theorem and Desideratum #1 guarantee that valid calculations must arrive at the same conclusion. The calculation from logic is manifestly correct, therefore Glymour’s argument cannot be valid. Jefferys Old Data

  13. General Overview Counterexample to Glymour’s Argument Glymour’s Argument Glymour’s Friend Summary and Conclusions Where Glymour Went Wrong Directly Contradicts Logic Glymour’s questionable assumption that P ( E ) = 1 generates a contradiction with logic. If P ( E ) = 1, then P ( E | X ) = 1 for any X In particular, P ( E | T ) = 1 Therefore, T → E But we know that T → E = ¬ E Glymour’s argument leads to an absurd conclusion: that Newtonian physics predicts anomalous perihelion motion of Mercury. Therefore, P ( E ) � = 1, and Glymour’s argument falls apart Jefferys Old Data

  14. General Overview Counterexample to Glymour’s Argument Glymour’s Argument Glymour’s Friend Summary and Conclusions Where Glymour Went Wrong It’s Even More Absurd Glymour’s logic goes even further: It says that if E is old data, then for any theory X whatsoever, P ( E | X ) = 1, so X → E , and thus X predicts that we will observe E . This is obviously absurd; the predictions of a theory depend only on the theory, and are independent of observations we may or may not have made. Under Glymour’s reasoning, if E is “old data”, then every theory X is an example of Jaynes’ dreaded “Sure Thing R � ” theory, under which E is just what the theory predicts will be observed. Jefferys Old Data

  15. General Overview Counterexample to Glymour’s Argument Glymour’s Argument Glymour’s Friend Summary and Conclusions Where Glymour Went Wrong Violates Desideratum #3 “Wigner’s Friend” shows how people with different initial states of knowledge arrive at the same conclusions when their knowledge is made to coincide Glymour’s friend Tom is ignorant of E , so he is entitled to regard E , when he learns it, as “new” data Suppose Tom chooses P ( T ) = 1 / 2, informs Glymour, and Glymour agrees on this prior Using Bayes’ theorem, Tom can calculate in advance that if E is observed, then T is true, and if E is observed, then T is true Jefferys Old Data

  16. General Overview Counterexample to Glymour’s Argument Glymour’s Argument Glymour’s Friend Summary and Conclusions Where Glymour Went Wrong Violates Desideratum #3 When he is informed that E was observed, he concludes that P ( T | E ) = 1, so T is true. Glymour concludes only that P ( T | E ) = P ( T ) = 1 / 2 Tom and Glymour have the same priors and now know the same relevant facts; but they have reached different conclusions This violates Jaynes’ Desideratum #3. Jefferys Old Data

  17. General Overview Counterexample to Glymour’s Argument Glymour’s Argument Glymour’s Friend Summary and Conclusions Where Glymour Went Wrong All Probability Is Conditional Jaynes: A fruitful source of error and even apparent paradoxes in probability theory is to fail to condition properly and explicitly on all background information used Let B represent all background information except E For example, in the toy example B includes physics, e.g.., T → E B can also be regarded as including the priors P ( T | B ) . . . This point of view makes Glymour’s error embarrassingly obvious. Just systematically insert the conditioning information E , B that he actually used into his proof Jefferys Old Data

  18. General Overview Counterexample to Glymour’s Argument Glymour’s Argument Glymour’s Friend Summary and Conclusions Where Glymour Went Wrong What Glymour Actually Proved P ( E | E , B ) = 1 !!! P ( E | E , T , B ) = 1 P ( E | E , T , B ) P ( T | E , E , B ) = P ( E | E , B ) P ( T | E , B ) = P ( T | E , B ) Glymour has actually proved that no one can abuse the Bayesian machinery by using the same evidence twice Jefferys Old Data

  19. General Overview Counterexample to Glymour’s Argument Glymour’s Argument Glymour’s Friend Summary and Conclusions Where Glymour Went Wrong It’s Logic, Not Epistemology Glymour’s failure to condition explicitly on all the background information he used misled him into thinking that P ( E ) is an epistemological statement about our knowledge of E at any point t in time (he even used subscripts t to make this point). This viewpoint is incorrect. P ( E | B ) is a logical relationship between B and E Its value depends on B It predicts the probability of observing E , given that we know only the theory and the priors, under the mixture model defined by the likelihood and priors Jefferys Old Data

Recommend


More recommend