how computers discover how computers discover
play

How Computers Discover How Computers Discover A Mini-Review of - PowerPoint PPT Presentation

laboratory Gerstner Discover how to discover best How Computers Discover How Computers Discover A Mini-Review of Algorithmic Meta-Discovery Filip Zelezn y CVUT Prague, School of Electrical Engineering, Dept. of Cybernetics


  1. laboratory Gerstner “Discover how to discover best” How Computers Discover How Computers Discover A Mini-Review of Algorithmic Meta-Discovery Filip ˇ Zelezn´ y ˇ CVUT Prague, School of Electrical Engineering, Dept. of Cybernetics The Gerstner Laboratory for Intelligent Decision Making and Control � Filip ˇ Zelezn´ y 2005 c 1 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  2. laboratory Gerstner Introduction :: Traditional scientific discovery : a human forming a hypothesis explaining observations of some natural phenomena. :: Computer-based scientific discovery, usually employing machine learning algorithms. � Filip ˇ Zelezn´ y 2005 c 2 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  3. laboratory Gerstner Automated Discovery :: Computer programs constructing hypotheses from data − Machine Learning − Data Mining − Knowledge Discovery in Databases :: Highlight: the Robot-Scientist project (UK) − Robot develops predicate-logic hypotheses in functional genomics − Designs optimal experiments to validate hypotheses − Realizes the experiments physically − King et al, Nature vol. 427, 2004 � Filip ˇ Zelezn´ y 2005 c 3 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  4. laboratory Gerstner Meta-Discovery :: Viewing computer-based scientific discovery as an empirical phenomenon :: Inferring hypotheses thereabout. � Filip ˇ Zelezn´ y 2005 c 4 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  5. laboratory Gerstner Phase Transitions :: Originally: runtime statistics of problem solving algorithms on randomly generated problem instances. Example: propositional logic SATisfiability. 50,000 500,000 Soluble Insoluble avg. # backtracks # backtracks 25,000 250,000 0 0 1 5 10 1 5 10 #clauses / #variables #clauses / #variables Figure 1: The NP-complete logic satisfiability problem. Algorithm: Davis-Putnam search :: ← Under constrained (many solutions) vs. → Over constrained (small search) Hardest problems on the transition between. � Filip ˇ Zelezn´ y 2005 c 5 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  6. laboratory Gerstner Phase Transitions in Learning? :: “ Inductive Logic Programming ” (ILP): First-Order Logic representation of Data / Hypotheses. :: Example: Biochemistry . Predicting mutagenic activity by compound structure. :: Example Hypothesis active(A) ← atm(A, B, c, 10, C) ∧ atm(A, D, c, 10, C) ∧ bond(A, B, D, 1) :: Verifying the rule for given examples (chem. compounds) ≡ SAT problem :: Empirical studies Serra et al, IJCAI 01; Botta et al, JMLR 4:2003 : ILP systems tend to generate hypotheses in the Phase Transition region. � Filip ˇ Zelezn´ y 2005 c 6 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  7. laboratory Gerstner Heavy-Tailed Runtime Distributions :: What goes on in the PT region? Model runtime distributions. :: P(not achieving solution in time t ) − normal : decays exponentially with t − heavy-tailed decays by power-law (may have infinite moments, eg. mean) 1e−00 1e−02 1−F(x) (logscale) 1e−04 1e−06 1e+03 5e+03 2e+04 5e+04 2e+05 # backtracks ~ cpu time (logscale) � Filip ˇ Zelezn´ y 2005 c 7 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  8. laboratory Gerstner Heavy-Tailed Runtime Distributions :: HT Distribs: “Statistical Curiosity”, early 20th century: − V. Pareto: Income Distributions, − B. Mandelbrot: Fractal Phenomena in Nature :: Empirical finding Gomes et al, Jr Autom Reas 2001 Important combinatorial problems/algorithms exhibit heavy-tailed RTD . Surveyed randomized algorithms AND/OR problem instances :: In hypothesis learning: Zelezny et al, ILP 2002 Heavy-tailed RTD’s manifest themselves in ILP . − Not only a consequence of involved hypothesis checking (=SAT) − HT RTD also in terms of # of hypotheses searched � Filip ˇ Zelezn´ y 2005 c 8 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  9. laboratory Gerstner Restarted Randomized Search :: HT RTD: Intriguing consequences. − f ( t )∆ t 1 − F ( t ) prob finding a solution in the next ∆ t if not found until now = t . − Decreases with t . − The longer you search, the lower your chances... :: Makes sense to restart search every now and then ?! :: Indeed, − Non-Restarted search RT cdf F ( t ) : infinite mean, but F ( γ ) > 0 for some γ > 0 − Search restarted each time γ (“cut-off”) time achieved. RT cdf: F γ ( N ) = 1 − (1 − F ( γ )) N : exponential ⇒ finite mean � Filip ˇ Zelezn´ y 2005 c 9 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  10. laboratory Gerstner Restarted Randomized Search in ILP :: Expected runtime of ILP algorithm with restart cut-off time γ , to find hypothesis of given quality . Log-scale, orders of magnitude performance gains. cost (logscale) [1:200000] score [1:20] cutoff (logscale) [1:65536] :: Large empirical study ( Zelezny et al, ILP 2004 ): − 100-200 Condor Cluster PC’s – UW Madison − SGI Altix SuperComputer – CTU Prague � Filip ˇ Zelezn´ y 2005 c 10 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  11. laboratory Gerstner Occam’s razor: Empirical Assessment :: William of Ockham, 14 th century English logician. “Entities should not be multiplied beyond necessity.” :: Traditional Machine Learning interpretation “If several hypotheses explain data with roughly same accuracy, keep the simplest.” :: Reasons: 1. Evident: ease of human interpretation 2. Postulated: predictive ability (theory does not give a clue) :: Thanks to automated discovery, Reason 2 can be empirically tested. � Filip ˇ Zelezn´ y 2005 c 11 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  12. laboratory Gerstner Occam’s razor: Empirical Assessment :: Some seminal empirical studies ( Holte, Mach Learn 1993 ) apparently support the simplicity bias, but misinterpretation here. :: Detrimental effect on predictive accuracy due to − too many hypotheses tested − rather than too complex hypotheses tested Relation hyp space size / avg hyp complexity only incidental :: Domingos, Data Mining & Know Disc 1999 reviews empirical evidence againts Reason 2 for Occam’s razor. Successes of − Ensemble Learning (combining numerous complex hypotheses) − Support Vector Machines (transforming data to high dimensional spaces) − Excessive search leading to simple inaccurate hyps Quinlan et al, IJCAI 1995 � Filip ˇ Zelezn´ y 2005 c 12 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  13. laboratory Gerstner Computerized Meta-Learning :: So far: :: Now shifting to Meta-Learning : “Learn how to learn best” � Filip ˇ Zelezn´ y 2005 c 13 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  14. laboratory Gerstner Meta-Learning Achievements :: Traditional approaches: see Mach Learn spec issue Meta Learning, 54:2004 . Examples: − Meta-hypothesize on Which learning algorithm best for given data? − Predict range of parameters (eg. kernel-width for SVM’s) given meta-data. :: Unorthodox approaches: Maloberti, Sebag: Mach Learn 55(2):2004 − Detect position of problem w.r.t the Phase Transition region − Use to determine the best learning algorithm :: Other: Bensusan, ECML 1998 meta-learns how much pruning should be used. − Pruning ≈ simplifying hypotheses at some accuracy sacrifice − Occam’s razor motivated (title: “God does not always shave with Occam’s razor”) � Filip ˇ Zelezn´ y 2005 c 14 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  15. laboratory Gerstner Speculations :: Given Meta-Learning is useful, would Meta-Meta-Learning be? :: And Meta − . . . Meta − learning? � �� � n × 1 2 gt 2 s = 1 1 2 gt 2 1 2 gt 2 s = s =1 2 gt 2 Simple rules s = 1 1 2 gt 2 s = 2 gt 2 s = 2 gt 2 s = 1 work best. 1 1 2 gt 2 s =1 2 gt 2 s =1 2 gt 2 Simple rules s = Simple rules 1 2 gt 2 s = 1 1 1 2 gt 2 2 gt 2 s =1 2 gt 2 2 gt 2 2 gt 2 Simple rules s =1 s = s = s = Simple rules 2 gt 2 work best. s = 1 1 2 gt 2 work best. s = 2 gt 2 2 gt 2 s = s =1 Simple rules work best. 1 1 work best. 2 gt 2 s = 2 gt 2 2 gt 2 s =1 s =1 Simple rules Simple rules 1 work best. ........... 2 gt 2 2 gt 2 2 gt 2 s = s = Simple rules ........... s =1 work best. work best. 2 gt 2 s = work best. etc. What if n infinite? (much like Lisp/Prolog meta-interpretation towers ) � Filip ˇ Zelezn´ y 2005 c 15 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

Recommend


More recommend