How Computers Discover How Computers Discover A Mini-Review of - PowerPoint PPT Presentation

laboratory Gerstner “Discover how to discover best” How Computers Discover How Computers Discover A Mini-Review of Algorithmic Meta-Discovery Filip ˇ Zelezn´ y ˇ CVUT Prague, School of Electrical Engineering, Dept. of Cybernetics The Gerstner Laboratory for Intelligent Decision Making and Control � Filip ˇ Zelezn´ y 2005 c 1 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

laboratory Gerstner Introduction :: Traditional scientific discovery : a human forming a hypothesis explaining observations of some natural phenomena. :: Computer-based scientific discovery, usually employing machine learning algorithms. � Filip ˇ Zelezn´ y 2005 c 2 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

laboratory Gerstner Automated Discovery :: Computer programs constructing hypotheses from data − Machine Learning − Data Mining − Knowledge Discovery in Databases :: Highlight: the Robot-Scientist project (UK) − Robot develops predicate-logic hypotheses in functional genomics − Designs optimal experiments to validate hypotheses − Realizes the experiments physically − King et al, Nature vol. 427, 2004 � Filip ˇ Zelezn´ y 2005 c 3 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

laboratory Gerstner Meta-Discovery :: Viewing computer-based scientific discovery as an empirical phenomenon :: Inferring hypotheses thereabout. � Filip ˇ Zelezn´ y 2005 c 4 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

laboratory Gerstner Phase Transitions :: Originally: runtime statistics of problem solving algorithms on randomly generated problem instances. Example: propositional logic SATisfiability. 50,000 500,000 Soluble Insoluble avg. # backtracks # backtracks 25,000 250,000 0 0 1 5 10 1 5 10 #clauses / #variables #clauses / #variables Figure 1: The NP-complete logic satisfiability problem. Algorithm: Davis-Putnam search :: ← Under constrained (many solutions) vs. → Over constrained (small search) Hardest problems on the transition between. � Filip ˇ Zelezn´ y 2005 c 5 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

laboratory Gerstner Phase Transitions in Learning? :: “ Inductive Logic Programming ” (ILP): First-Order Logic representation of Data / Hypotheses. :: Example: Biochemistry . Predicting mutagenic activity by compound structure. :: Example Hypothesis active(A) ← atm(A, B, c, 10, C) ∧ atm(A, D, c, 10, C) ∧ bond(A, B, D, 1) :: Verifying the rule for given examples (chem. compounds) ≡ SAT problem :: Empirical studies Serra et al, IJCAI 01; Botta et al, JMLR 4:2003 : ILP systems tend to generate hypotheses in the Phase Transition region. � Filip ˇ Zelezn´ y 2005 c 6 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

laboratory Gerstner Heavy-Tailed Runtime Distributions :: What goes on in the PT region? Model runtime distributions. :: P(not achieving solution in time t ) − normal : decays exponentially with t − heavy-tailed decays by power-law (may have infinite moments, eg. mean) 1e−00 1e−02 1−F(x) (logscale) 1e−04 1e−06 1e+03 5e+03 2e+04 5e+04 2e+05 # backtracks ~ cpu time (logscale) � Filip ˇ Zelezn´ y 2005 c 7 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

laboratory Gerstner Heavy-Tailed Runtime Distributions :: HT Distribs: “Statistical Curiosity”, early 20th century: − V. Pareto: Income Distributions, − B. Mandelbrot: Fractal Phenomena in Nature :: Empirical finding Gomes et al, Jr Autom Reas 2001 Important combinatorial problems/algorithms exhibit heavy-tailed RTD . Surveyed randomized algorithms AND/OR problem instances :: In hypothesis learning: Zelezny et al, ILP 2002 Heavy-tailed RTD’s manifest themselves in ILP . − Not only a consequence of involved hypothesis checking (=SAT) − HT RTD also in terms of # of hypotheses searched � Filip ˇ Zelezn´ y 2005 c 8 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

laboratory Gerstner Restarted Randomized Search :: HT RTD: Intriguing consequences. − f ( t )∆ t 1 − F ( t ) prob finding a solution in the next ∆ t if not found until now = t . − Decreases with t . − The longer you search, the lower your chances... :: Makes sense to restart search every now and then ?! :: Indeed, − Non-Restarted search RT cdf F ( t ) : infinite mean, but F ( γ ) > 0 for some γ > 0 − Search restarted each time γ (“cut-off”) time achieved. RT cdf: F γ ( N ) = 1 − (1 − F ( γ )) N : exponential ⇒ finite mean � Filip ˇ Zelezn´ y 2005 c 9 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

laboratory Gerstner Restarted Randomized Search in ILP :: Expected runtime of ILP algorithm with restart cut-off time γ , to find hypothesis of given quality . Log-scale, orders of magnitude performance gains. cost (logscale) [1:200000] score [1:20] cutoff (logscale) [1:65536] :: Large empirical study ( Zelezny et al, ILP 2004 ): − 100-200 Condor Cluster PC’s – UW Madison − SGI Altix SuperComputer – CTU Prague � Filip ˇ Zelezn´ y 2005 c 10 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

laboratory Gerstner Occam’s razor: Empirical Assessment :: William of Ockham, 14 th century English logician. “Entities should not be multiplied beyond necessity.” :: Traditional Machine Learning interpretation “If several hypotheses explain data with roughly same accuracy, keep the simplest.” :: Reasons: 1. Evident: ease of human interpretation 2. Postulated: predictive ability (theory does not give a clue) :: Thanks to automated discovery, Reason 2 can be empirically tested. � Filip ˇ Zelezn´ y 2005 c 11 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

laboratory Gerstner Occam’s razor: Empirical Assessment :: Some seminal empirical studies ( Holte, Mach Learn 1993 ) apparently support the simplicity bias, but misinterpretation here. :: Detrimental effect on predictive accuracy due to − too many hypotheses tested − rather than too complex hypotheses tested Relation hyp space size / avg hyp complexity only incidental :: Domingos, Data Mining & Know Disc 1999 reviews empirical evidence againts Reason 2 for Occam’s razor. Successes of − Ensemble Learning (combining numerous complex hypotheses) − Support Vector Machines (transforming data to high dimensional spaces) − Excessive search leading to simple inaccurate hyps Quinlan et al, IJCAI 1995 � Filip ˇ Zelezn´ y 2005 c 12 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

laboratory Gerstner Computerized Meta-Learning :: So far: :: Now shifting to Meta-Learning : “Learn how to learn best” � Filip ˇ Zelezn´ y 2005 c 13 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

laboratory Gerstner Meta-Learning Achievements :: Traditional approaches: see Mach Learn spec issue Meta Learning, 54:2004 . Examples: − Meta-hypothesize on Which learning algorithm best for given data? − Predict range of parameters (eg. kernel-width for SVM’s) given meta-data. :: Unorthodox approaches: Maloberti, Sebag: Mach Learn 55(2):2004 − Detect position of problem w.r.t the Phase Transition region − Use to determine the best learning algorithm :: Other: Bensusan, ECML 1998 meta-learns how much pruning should be used. − Pruning ≈ simplifying hypotheses at some accuracy sacrifice − Occam’s razor motivated (title: “God does not always shave with Occam’s razor”) � Filip ˇ Zelezn´ y 2005 c 14 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

laboratory Gerstner Speculations :: Given Meta-Learning is useful, would Meta-Meta-Learning be? :: And Meta − . . . Meta − learning? � �� n × 1 2 gt 2 s = 1 1 2 gt 2 1 2 gt 2 s = s =1 2 gt 2 Simple rules s = 1 1 2 gt 2 s = 2 gt 2 s = 2 gt 2 s = 1 work best. 1 1 2 gt 2 s =1 2 gt 2 s =1 2 gt 2 Simple rules s = Simple rules 1 2 gt 2 s = 1 1 1 2 gt 2 2 gt 2 s =1 2 gt 2 2 gt 2 2 gt 2 Simple rules s =1 s = s = s = Simple rules 2 gt 2 work best. s = 1 1 2 gt 2 work best. s = 2 gt 2 2 gt 2 s = s =1 Simple rules work best. 1 1 work best. 2 gt 2 s = 2 gt 2 2 gt 2 s =1 s =1 Simple rules Simple rules 1 work best. ........... 2 gt 2 2 gt 2 2 gt 2 s = s = Simple rules ........... s =1 work best. work best. 2 gt 2 s = work best. etc. What if n infinite? (much like Lisp/Prolog meta-interpretation towers ) � Filip ˇ Zelezn´ y 2005 c 15 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

How Computers Discover How Computers Discover A Mini-Review of - PowerPoint PPT Presentation

laboratory Gerstner Discover how to discover best How Computers Discover How Computers Discover A Mini-Review of Algorithmic Meta-Discovery Filip Zelezn y CVUT Prague, School of Electrical Engineering, Dept. of Cybernetics

Language and Computers where to start? Outline Computers Computers Computers Topic 1: Text

Y2 Parent Information Night The Year 2 Team Grow. Discover. Dream. People Working With Year 2

Welcome tEa Session 14 14 Se Sept ptembe mber r 20 2020 20 Discover Victoria Discover

Quantum Mechanics; a Blessing and a Curse By Elias Marcopoulos Quantum Computers Quantum

Student Information System Request for Proposals DISCOVER | NURTURE | INSPIRE DISCOVER |

DISCOVER RIM WASH SYSTEM WHAT IS DISCOVER? It is a new and innovative NO TOUCH rim wash

Language and Computers where to start? Language and Outline Language and Computers

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Good Morning! INT1004 Computers for Business Ulrich Werner Discovering Computers Technology in

Outline Searching Computers Computers Computers Topic 2: Searching Topic 2: Searching Topic

Who cares about spelling? Why people care about spelling Computers Computers Computers Topic

A Brief History of Computers A Brief History of Computers A Brief History of Computers By

ACQUIRE | DISCOVER | FINANCE | BUILD | OPERATE ACQUIRE | DISCOVER | FINANCE | BUILD | OPERATE A

ACQUIRE | DISCOVER | FINANCE | BUILD | OPERATE ACQUIRE | DISCOVER | FINANCE | BUILD | OPERATE A

ACQUIRE | DISCOVER | FINANCE | BUILD | OPERATE ACQUIRE | DISCOVER | FINANCE | BUILD | OPERATE

ACQUIRE | DISCOVER | FINANCE | BUILD | OPERATE ACQUIRE | DISCOVER | FINANCE | BUILD | OPERATE A

Decision Trees LING 572 Advanced Statistical Methods for NLP January 9, 2020 1 Sunburn Example

Concluding Remarks Information and Statistics in Nuclear Experiment and Theory #3 ECT*, Trento,

Self- -Verifying Verifying Self Self-Verifying * * Dining Philosophers Dining Philosophers

Probabilistic & Unsupervised Learning Model selection, Hyperparameter optimisation, and

Decision Trees Sven Koenig, USC Russell and Norvig, 3 rd Edition, Section 18.3 These slides are

Computer Simulation and Applications in Life Sciences Dr. Michael Emmerich & Dr. Andre Deutz

Berlin Buzzwords, June 4th, 2012, Dr. Christoph Goller, IntraFind Software AG Outline

Modeling What exactly is the problem, the expected benefit? project understanding How would a

How Computers Discover How Computers Discover A Mini-Review of - PowerPoint PPT Presentation

laboratory Gerstner Discover how to discover best How Computers Discover How Computers Discover A Mini-Review of Algorithmic Meta-Discovery Filip Zelezn y CVUT Prague, School of Electrical Engineering, Dept. of Cybernetics

Language and Computers where to start? Outline Computers Computers Computers Topic 1: Text

Y2 Parent Information Night The Year 2 Team Grow. Discover. Dream. People Working With Year 2

Welcome tEa Session 14 14 Se Sept ptembe mber r 20 2020 20 Discover Victoria Discover

Quantum Mechanics; a Blessing and a Curse By Elias Marcopoulos Quantum Computers Quantum

Student Information System Request for Proposals DISCOVER | NURTURE | INSPIRE DISCOVER |

DISCOVER RIM WASH SYSTEM WHAT IS DISCOVER? It is a new and innovative NO TOUCH rim wash

Language and Computers where to start? Language and Outline Language and Computers

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Good Morning! INT1004 Computers for Business Ulrich Werner Discovering Computers Technology in

Outline Searching Computers Computers Computers Topic 2: Searching Topic 2: Searching Topic

Who cares about spelling? Why people care about spelling Computers Computers Computers Topic

A Brief History of Computers A Brief History of Computers A Brief History of Computers By

ACQUIRE | DISCOVER | FINANCE | BUILD | OPERATE ACQUIRE | DISCOVER | FINANCE | BUILD | OPERATE A

ACQUIRE | DISCOVER | FINANCE | BUILD | OPERATE ACQUIRE | DISCOVER | FINANCE | BUILD | OPERATE A

ACQUIRE | DISCOVER | FINANCE | BUILD | OPERATE ACQUIRE | DISCOVER | FINANCE | BUILD | OPERATE

ACQUIRE | DISCOVER | FINANCE | BUILD | OPERATE ACQUIRE | DISCOVER | FINANCE | BUILD | OPERATE A

Decision Trees LING 572 Advanced Statistical Methods for NLP January 9, 2020 1 Sunburn Example

Concluding Remarks Information and Statistics in Nuclear Experiment and Theory #3 ECT*, Trento,

Self- -Verifying Verifying Self Self-Verifying * * Dining Philosophers Dining Philosophers

Probabilistic &amp; Unsupervised Learning Model selection, Hyperparameter optimisation, and

Decision Trees Sven Koenig, USC Russell and Norvig, 3 rd Edition, Section 18.3 These slides are

Computer Simulation and Applications in Life Sciences Dr. Michael Emmerich &amp; Dr. Andre Deutz

Berlin Buzzwords, June 4th, 2012, Dr. Christoph Goller, IntraFind Software AG Outline

Modeling What exactly is the problem, the expected benefit? project understanding How would a

Probabilistic & Unsupervised Learning Model selection, Hyperparameter optimisation, and

Computer Simulation and Applications in Life Sciences Dr. Michael Emmerich & Dr. Andre Deutz