new computing approaches
play

New Computing Approaches unlimited release SAND2017-0924 C Erik P. - PowerPoint PPT Presentation

Photos placed in horizontal position with even amount of white space between photos and header Computational Complexity and Approved for New Computing Approaches unlimited release SAND2017-0924 C Erik P. DeBenedictis, Center for Computing


  1. Photos placed in horizontal position with even amount of white space between photos and header Computational Complexity and Approved for New Computing Approaches unlimited release SAND2017-0924 C Erik P. DeBenedictis, Center for Computing Research, Sandia Wildly Heterogeneous Post-CMOS Technologies Meet Software Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE -AC04-94AL85000. SAND NO. 2011-XXXXP 1

  2. Overview Logic devices fall into Solution: complexity theory categories by potential upside based on a kT measure • A large class of devices are • Discounts CMOS’ maturity limited by thermodynamics advantage by assessing physical limits of energy – CMOS in this large class and has a big head start efficiency in units of kT – A common limit precludes • Use algorithmic complexity any from being a lot better to assess devices’ ability to than the others combine into useful • However, differences are functions worth exploiting • Need analog vs. digital kT • How do we compare within comparisons to work the large class? 2

  3. Scope of Talk is the Red Class Name of approach Performance limit or other Investment to capability date Neural networks (irrespective of Learning and maybe Billions intelligence 1 implementation) Quantum computing Quantum speedup Billions (superconducting electronics) Thermodynamic ( kT ) 1 Neuromorphic computing, i. e. Billions implementations of neural networks Thermodynamic ( kT ) 2 Novel devices: Spintronics, Carbon Millions Nanotubes, Josephson Junctions, (each) new memories, etc. Thermodynamic ( kT ) 3 Analog computing Millions Thermodynamic ( kT ) 4 “3D+architecture,” i. e. Trillion continuation of Moore’s law Reversible computing Arbitrarily low energy/op Millions 1 DeBenedictis, Erik P. "Rebooting Computers as Learning Machines." Computer 49.6 (2016): 84-87. 2 DeBenedictis, Erik P. "The Boolean Logic Tax." Computer 49.4 (2016): 79-82. 3 DeBenedictis, Erik P. "Computational Complexity and New Computing Approaches." Computer 49.12 (2016): 76-79. 4 DeBenedictis, Erik P. “It’s Time to Redefine Moore’s Law Again.” Computer 50.2 (2017): 40-43 (still in print) 3

  4. Overview of Example Memristor-based neural Analyzing via complexity networks as an example theory based on kT measure • • Let’s compare limits Analog memristor-based neural networks are claimed – Digital kT limits via Landauer, to be more energy-efficient etc. than a digital implementation – Analog kT limits from circuit • Difficulties in comparison theory – Scale: Measured memristor • Result (below, will derive) circuits are small, but a GPU cluster can execute billions of • Interpretation (will discuss) synapses – There is a parameter space of – Precision: Memristors typically scale and precision where have a dozen levels, but GPUs use floating point each is best 2 ( L ) E digital = ~24 ln(1/ p error ) log 2 N kT ln(1/ p error ) L 2 N 2 E analog = ~1/24 kT 4

  5. Novelty of Next Few Slides • To compare digital and analog, we need a p error , or the probability that the answer will be wrong. Reliability goes up with energy, so we need a common reference point. • Analog circuits are limited by thermal noise of magnitude kT , but the theory is not organized in the same way as digital minimum energy. • The terminology has to line up. 5

  6. Digital Minimum Energy Digital circuit Minimum energy • Vectors v and w are inputs • p error per input is e - E signal/ kT • Leading to gate energy E gate = ~2 ln(1/ p error ) kT assuming 2 inputs • L distinguishable levels require log 2 L -bit binary numbers • Multiplier array is about 6 N 2 gates and assume 100% overhead 2× 2 ( L ) E digital = ~24 ln(1/ p error ) log 2 N kT 6

  7. Analog Minimum Energy I Analog circuit Circuit analysis • Inputs v and w = 1/ g • P n = 4 kTf = V n 2 ½ g max N , P n ( V n ) is noise power (voltage) at amplifier, f is amplifier bandwidth and conductivities are 0… g max • V pn = V n A v √ln(1/ p error ), where V pn is peak noise • P dot ( B ) = 1/6 V 2 g max N , where • L = 2 V / V pn , where V is P dot is power of dot product supply and V pn is peak noise • E dot = P dot (B) / (2 f ), where at amp. E dot is energy at Nyquist freq 7

  8. Analog Minimum Energy II So now what happens? • Landauer’s contribution was • If Landauer was right, the to establish circuit values should cancel. implementation- Hmm. Let’s try… independent minimum energies for computation. • The previous slide was just a bunch of circuit equations – Two equations with g max – Two equations with V – Two equations with f ln(1/ p error ) L 2 N 2 E analog = ~1/24 kT 8

  9. Comparison of Minimum Energies Each “wins” in a region of the How can this be right? The parameter space human brain is misplaced • Well, actually, the human brain is digital. • Tell story: Neuroscientist Brad looked at result and said “oh yeah, biology uses level-based signaling in C. elegans and retinas…” Ha, only small scale. So maybe god/evolution figured this out already 2 ( L ) E digital = ~24 ln(1/ p error ) log 2 N kT ln(1/ p error ) L 2 N 2 E analog = ~1/24 kT 9

  10. What’s Different? Look  Variable energy per multiply, at equal precision here  Divide by N for energy per arithmetic operation: 2 ( L ) E digital / N = ~24 ln(1/ p error ) log 2 kT ln(1/ p error ) L 2 E analog / N = ~1/24 N kT  The energy consumed by an analog multiply depends on how many times the result is added up.  …or maybe, multiplies are free, but adds are not?  Why? Circuit equations rule, but intuitively, signals flow backwards through the memristor array (show the audience). Consequence: Algorithms do not readily transport from analog to digital and vice versa. 10

  11. Second Example: Ultra Low-energy Synapse  The kT limits approach can be applied to a quasi-analog neural synapse  Achieves much less than kT energy dissipation per training cycle  Why?  Most neural network learning is merely verifying that the system has learned what it needs to know  Only state changes need to dissipate energy  Ref: DeBenedictis, Erik P., et al. "A path toward ultra-low- energy computing." Rebooting Computing (ICRC), IEEE International Conference on . IEEE, 2016. 11

  12. Landauer’s Method Extracted From his Paper prob p q r p1 q1 r1 Si (k's) State Sf (k's) System: 1  0.25993 a 0.125 1 1 1 1 1 0.25993 0  0.25993 b 0.125 1 1 0 0 1 0.25993 p p 1 1  0.25993 g 0.125 1 0 1 1 0 0.367811 q q 1 0  0.25993 d 0.125 1 0 0 0 0 0.367811 1  0.25993 g 0.125 0 1 1 1 0 0 r r 1 0  0.25993 d 0.125 0 1 0 0 0 0 1  0.25993 g 0.125 0 0 1 1 0 0 0  0.25993 d 0.125 0 0 0 0 0 0 2.079442 Sf (k's) 1.255482 Si-Sf (k's) 0.823959 Typically of the order of kT for each irreversible function From source: …typically of the order of kT for each irreversible function 12 [Landauer 61] Landauer, Rolf. "Irreversibility and heat generation in the computing process." IBM journal of research and development 5.3 (1961): 183-191.

  13. Backup: Details  Each input combination gets a row  Each input combination k has probability p k , p k ’s summing to 1  S i (i for input) is the sum of all p k log p k ’s  Each unique output combination is analyzed  Rows merge if the machine produces the same output  Each output combination k has probability p k , p k ’s summing to 1  S f (f for final) is the sum of all p k log p k ’s  Minimum energy is S i – S f  Notes  Inputs states that don’t merge do not raise minimum energy  Inputs that merge raise minimum energy based on their probability  Assumption: All input combinations equally probable 13

  14. Example: a Learning Machine This “learning machine” example exceeds energy efficiency limits of Boolean logic. The learning machine continues indefinitely monitors the environment for knowledge, yet usually just lion apple night danger food sleep verifies that it has learned what it needs to know. Say 0 0 0 0 0 0 “causes” (lion, apple, and night) and “effects” (danger, 0 0 0 0 0 0 food, and sleep) have value 1. 0 0 0 0 0 0 1 0 0 1 1 0 Example input: Old-style {lion, danger } {apple, food } {night, sleep } {lion, magnetic danger } {apple, food } {night, sleep } {lion, cores danger } {apple, food } {night, sleep } {lion, 1 1 0 1 0 0 danger, food } {apple, food } {night, sleep } { lion, 0 0 0 0 0 0 danger } {lion, danger } Functional example: Signals create currents; Machine continuously monitors environment for {1, 1} or core flips a  1.5 {-1, -1} pairs and remembers them in state of a magnetic core. Theoretically, there is no need for energy consumption unless state changes. 14

Recommend


More recommend