data farming
play

Data Farming Getting the Most Out of Moores Law and Cluster - PowerPoint PPT Presentation

Data Farming Getting the Most Out of Moores Law and Cluster Computing Data Mining vs. Data Farming Miners seek valuable buried nuggets - Miners have no control over whats there or how hard it is to separate it out - Data Mining seeks


  1. Data Farming Getting the Most Out of Moore’s Law and Cluster Computing

  2. Data Mining vs. Data Farming • Miners seek valuable buried nuggets - Miners have no control over what’s there or how hard it is to separate it out - Data Mining seeks valuable information buried within massive amounts of data • Farmers cultivate to maximize yield - Farmers manipulate the environment to their advantage: pest control, irrigation, fertilizer, etc. - Data Farming manipulates simulation models to advantage with designed experimentation

  3. Simulation in DoD • DoD uses complex high-dimensional simulation models as an important tool in its decision-making processes for diverse areas such as: logistics, humanitarian aid, peace support operations, anti- piracy & anti-terrorist efforts, future force planning, and combat modeling • Many simulations involve dozens, hundreds, or thousands of “factors” that can be set to different levels

  4. Abstracting Simulation O I u n Simulation t p p u Model u t t s s • A computer simulation transforms inputs to outputs • Pareto Principle - a small subset of the inputs dominate in determining the outputs

  5. Design of Experiments “The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb

  6. Design of Experiments “The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb But simulation experiments are different...

  7. Design of Experiments “The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb But simulation experiments are different... Typical assumptions for physical experiments – Small/ moderate # of factors – Univariate response – Homogeneous error – Linear – Sparse effects – Higher order interactions negligible – Normal errors

  8. Design of Experiments “The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb But simulation experiments are different... Typical assumptions for Characteristics of typical physical experiments simulation models – Large # of factors – Small/ moderate # of factors – Many output measures of interest – Univariate response – Heterogeneous error – Homogeneous error – Non-linear – Linear – Many significant effects – Sparse effects – Significant higher order interactions – Higher order interactions negligible – Varied error structure – Normal errors

  9. Design of Experiments “The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb But simulation experiments are different... Typical assumptions for Characteristics of typical physical experiments simulation models – Large # of factors – Small/ moderate # of factors – Many output measures of interest – Univariate response – Heterogeneous error – Homogeneous error – Non-linear – Linear – Many significant effects – Sparse effects – Significant higher order interactions – Higher order interactions negligible – Varied error structure – Normal errors

  10. Why Do We Need DOE? Without a good plan for changing multiple factors simultaneously: • We limit the insights possible (can’t “untangle” effects) • Haphazardly choosing scenarios can use up a lot of time without yielding answers to the fundamental questions

  11. Why Do We Need DOE? Without a good plan for changing multiple factors simultaneously: • We limit the insights possible (can’t “untangle” effects) • Haphazardly choosing scenarios can use up a lot of time without yielding answers to the fundamental questions A Simple Example: Capture the Flag

  12. Why Do We Need DOE? Without a good plan for changing multiple factors simultaneously: • We limit the insights possible (can’t “untangle” effects) • Haphazardly choosing scenarios can use up a lot of time without yielding answers to the fundamental questions A Simple Example: Capture the Flag Speed Stealth Success? Low Low No Stealth High High Yes Speed

  13. Why Do We Need DOE? Without a good plan for changing multiple factors simultaneously: • We limit the insights possible (can’t “untangle” effects) • Haphazardly choosing scenarios can use up a lot of time without yielding answers to the fundamental questions A Simple Example: Capture the Flag Speed Stealth Success? Low Low No Which is more important, Stealth High High Yes stealth or speed? Speed

  14. Why Do We Need DOE? Without a good plan for changing multiple factors simultaneously: • We limit the insights possible (can’t “untangle” effects) • Haphazardly choosing scenarios can use up a lot of time without yielding answers to the fundamental questions A Simple Example: Capture the Flag Speed Stealth Success? Low Low No Which is more important, Stealth High High Yes stealth or speed? Speed No way to tell! The factors are “confounded”

  15. One-at-a-Time Variation?

  16. One-at-a-Time Variation? Speed Stealth Success? Low Low No Stealth High Low No Low High No Speed

  17. One-at-a-Time Variation? Speed Stealth Success? Low Low No Stealth High Low No Low High No Speed If we vary Speed and Stealth separately, we (incorrectly) conclude neither contributes to success!

  18. One-at-a-Time Variation? No! Speed Stealth Success? Low Low No Stealth High Low No Low High No Speed

  19. One-at-a-Time Variation? No! Speed Stealth Success? Low Low No Stealth High Low No Low High No Speed

  20. One-at-a-Time Variation? No! Speed Stealth Success? Low Low No Stealth High Low No Low High No Speed By varying Speed and Stealth together rather than separately, we see there is an “interaction”

  21. One-at-a-Time Variation? No! Speed Stealth Success? Low Low No Stealth High Low No Low High No Speed By varying Speed and Stealth together rather than separately, we see there is an “interaction” This is a “factorial” or “gridded” design

  22. Finer Grids • Which output would you prefer to see? Stealth Stealth Speed Speed • The fly in the ointment - Studying two factors at this level of detail requires 11x11=121 experiments. Three factors would take 11x11x11=1331 experiments.

  23. Finer Grids • Which output would you prefer to see? Stealth Stealth Speed Speed • The fly in the ointment - Studying two factors at this level of detail requires 11x11=121 experiments. Three factors would take 11x11x11=1331 experiments. Factorial Designs grow exponentially with the number of factors!

  24. How Bad is That? • Consider a model with 100 factors • Study each factor at only two levels This would require 2 100 experiments 2 100 ≈ 10 30 , i.e., a “one” followed by thirty zeros!

  25. How Bad is That? • Consider a model with 100 factors • Study each factor at only two levels This would require 2 100 experiments 2 100 ≈ 10 30 , i.e., a “one” followed by thirty zeros! If we could perform one billion experiments per second and started running experiments at the big bang, we would have completed less than (1/2500) th of the total number of experiments!!!!

  26. Can Moore’s Law Save us? • Moore’s Law is not a law - it is an observation that computing power has maintained an exponential growth rate • In recent years, this has produced “petaflop” computers

  27. Can Moore’s Law Save us? • Moore’s Law is not a law - it is an observation that computing power has maintained an exponential growth rate • In recent years, this has produced “petaflop” computers Petaflop = 1000 trillion ops/second Cost of “Roadrunner”= $133 million

  28. Can Moore’s Law Save us? • Moore’s Law is not a law - it is an observation that computing power has maintained an exponential growth rate • In recent years, this has produced “petaflop” computers Petaflop = 1000 trillion ops/second Cost of “Roadrunner”= $133 million • Using the Roadrunner supercomputer would reduce the time required for our experiment to a mere 40 million years • This is better, but still not good enough to be of practical use

  29. We Need New Types of Designs Efficient R5 FF and CCD

  30. We Need New Types of Designs Efficient R5 FF and CCD Factorial (gridded) designs are most familiar

  31. We Need New Types of Designs Efficient R5 FF and CCD

  32. We Need New Types of Designs We have focused on Latin hypercubes Efficient R5 FF and CCD

  33. We Need New Types of Designs and sequential Efficient R5 FF approaches and CCD

  34. We Need New Types of Designs Efficient R5 FF and CCD

  35. Nearly Orthogonal Latin Hypercubes -1. 0. 0. 1. -1. 0. 0. 1. -1. 0. 0. 1. 0 0 5 0 0 0 5 0 0 0 5 0 1.0 0.0 A -1.0 1.0 0.0 B -1.0 1.0 0.0 C -1.0 1.0 0.0 D -1.0 1.0 0.0 E -1.0 1.0 0.0 F -1.0 1.0 0.0 G -1.0 -1. 0. 0. 1. -1. 0. 0. 1. -1. 0. 0. 1. -1. 0. 0. 1. 0 0 5 0 0 0 5 0 0 0 5 0 0 0 5 0

Recommend


More recommend