The IEEE Rebooting Computing Initiative and the International Roadmap for Devices and Systems Tom Conte Co-Chair, I EEE Rebooting Com puting I nitiative Vice Chair, I nternational Roadm ap for Devices and System s Schools of CS & ECE, Georgia I nstitute of Technology tom @conte.us
W hy does com puting need a “reboot”? Moore's Law for 2D predicted to end in 2021 – But few architects care because… Transistors have been getting smaller but cannot be clocked faster – From an architect’s perspective: 10nm isn’t any better than 14nm, which was only marginally better than 22nm The Power Wall: Single thread exponential performance scaling already ended in 2005 2
A history of m odern com puting: How w e got here 1945: Von Neumann’s report describing computer arch. 1955: Manchester Transistor Computer, IBM 709T 1965: Software industry begins (IBM 360), Moore # 1 1975: Moore’s Law update; Dennard’s geo. scaling rule 1985: “Killer micros”: HPC, general-purpose hitch a ride on Moore’s law 1995: Slowdown in CMOS wires: superscalar era begins 3
I n 1 9 9 5 , w ire delays im pact pipelining: Superscalar begins Processor performance Moore’s law 4 Source: Sanjay Patel, UIUC (used with permission)
W e hid parallelism extraction w ith Superscalar Processor Microarchitectures Branch Instruction Instruction Fetch predictor Cache ... Decode & Dispatch ... register file Schedule Issue N independent instructions ... Execute in parallel Data Cache ... ALU ALU ALU ... Reorder instructions ... … Very few of these “tricks” are energy efficient 5
How w e got here, part 2 1945: Von Neumann’s report describing computer arch. 1955: Manchester Transistor Computer, IBM 709T 1965: Software industry begins (IBM 360) 1975: Moore’s Law; Dennard’s geometric scaling rule 1985: “Killer micros”: HPC, general-purpose hitch a ride on Moore’s law 1995: Slowdown in CMOS wires: superscalar era begins 2005: The Power Wall: Single thread exponential scaling ends (Intel Prescott) … 6
Intel P4 Prescott: Q1 2015 200W/cm 2
Multicore era begins Dilemma: Could not clock single core aggressively AND continued to get transistors/chip Solution: Clock multiple cores conservatively 8
How w e got here, part 3 1945: Von Neumann’s report describing computer arch. 1955: Manchester Transistor Computer, IBM 709T 1965: Software industry begins (IBM 360) 1975: Moore’s Law; Dennard’s geometric scaling rule 1985: “Killer micros”: HPC, general-purpose hitch a ride on Moore’s law 1995: Slowdown in CMOS wires: superscalar era begins 2005: The Power Wall: Single thread exponential scaling ends (Intel Prescott) 2012: Realizing the problem: IEEE Rebooting Computing Initiative founded 9
IEEE Rebooting Computing Goal: Rethink Everything : Turing & Von Neumann to now Why IEEE? Encompasses the whole computing stack Circuits & Systems Society Council on Electronic Design Automation 10
I EEE Rebooting Com puting Summit 1: 2013 Dec. 12-13 (summary online) – Invitation only – Three Pillars: Rebooting Com puting – Energy Efficiency Energy Efficiency Applications/ HCI – Security – Applications/HCI Security 11
I EEE Rebooting Com puting Summit 2 : 2014 May 14-16 – Engines of Computation Adiabatic/Reversible Computing Rebooting Com puting Approximate Computing Energy Efficiency Applications/ HCI Neuromorphic Computing Security Augmentation of CMOS Engine Room 12
RCI Sum m it 2 : W ays to com pute Many alternatives – New switch – 3D Integration – Adiabatic/ Reversible logic – Unreliable switch – Approximate, Stochastic – Cryogenic – Neuromorphic accelerators – Analog neuromorphic – Quantum – … not all are general-purpose drop ins – (nor do they need to be) 13
There w as a com m on phenom enon w e discovered… You talking to me ?!? The phenomenal success of von Neumann caused all other approaches to be labeled as “lunatic fringe” Biases against taking risks remains today 14
I EEE Rebooting Com puting Summit 3: 2014 Oct. 23-24 – Algorithms and Architectures Random algorithms Algorithms & HCI and Applications Architectures Rebooting Com puting Also: Security, Approximate Energy Efficiency Applications/ HCI Computing Security ITRS joins forces with RCI Engine Room 15
I EEE Rebooting Com puting Summit 4: 2015 Dec. 10-11 Goal: coordinating efforts between: Algorithms & – Industry (HP, Intel, NVIDIA) Architectures – US: DOE, DARPA, IARPA, NSF Rebooting Com puting Goal 2: How to roadmap the future Energy Efficiency Applications/ HCI Security Engine Room 16
RCI : “Softw are drives the com puter industry” Questions for software industry: – How valuable is legacy softw are ? – What computing resources do the em erging applications need? – How long and how much investment will it take to train new generation of program m ers ? Degrees of Pain Vs. Gain… 17
Potential Approaches vs. Disruption in Computing Stack Algorithm Language Non von Neumann computing API Architecture Architectural changes ISA Microarchitecture FU Hidden changes logic device “Moore More” Level 1 2 3 4 Total Disruption LEGEND: No Disruption
Level 1 : More Moore Software: Legacy code works without issue New switch candidates: – Logic examples: Tunneling FET,CNFET, superconducting electronics – Memory examples: MRAM, memristor, PCM, … 19
20 More Moore: A better sw itch? Courtesy Dimitri Nikonov and Ian Young
3 D Architecture exam ple: 21 21
3 D vs. 2 D Cost Reduction Deposition and etch Lithography and etch Gate Bulk Si 22
Level 1 : More Moore Software: Legacy code works without issue New switch candidates: – Logic examples: Tunneling FET,CNFET, superconducting electronics – Memory examples: MRAM, memristor, PCM Moore’s law w ill go to 3 D 23
Potential Approaches vs. Disruption in Computing Stack Algorithm Language Non von Neumann computing API Architecture Architectural changes ISA Microarchitecture FU Hidden changes logic device “Moore More” Level 1 2 3 4 Total Disruption LEGEND: No Disruption
Level 2 : Not CMOS, but hidden Software: Legacy code works, but may require performance tuning Superscalar in 1995 was an example Microarchitectural changes to – Use unreliable switch logic, and/ or – Use cryogenic superconducting – Reversible computing 25
Lowering voltage gives quadratic improvement in power, but Devices become unreliable below 1V Probability of signal error grows as energy of signal is reduced below 20kT
Traditional Fault Tolerant Computing Reliability “Triple Modular Redundancy” (TMR) – ~ 200% overhead in area and energy to correct an error due to a single bit flip. – Lose all power benefit of lower voltage 27
Redundant Residue Numbers can also correct errors Range = 3*5*2*7 = 210 Redundant decimal mod 3 mod 5 mod 2 mod 7 mod 11 mod 13 13 1 3 1 6 2 0 14 2 4 0 0 3 1 add case 1 27 0 2 1 6 7->5 1 add case 2 27 0 1->2 1 6 5 1 add case 3 27 0 1->2 1 6 7->5 1 Chinese Remainder Theorem |X’|11 & |X’|13 |X’|mc == |X|mc ? How to correct? Case 1 (0,2,1,6) ⇔ 27 ; X’ = 27; |27|11 = 5; |27|13 = 1 |X’|m5 = 5 |X|m5 = 7 |X’|m6 = 1 |X|m6 = 1 replace |X|m5 with |X’|m5 Case 2 (0,1,1,6) ⇔ 111 ; X’ = 111; |111|11 = 1; |111|13 = 7 |X’|m5 = 1 |X|m5 = 5 |X’|m6 = 7 |X|m6 = 1 check error correction table Case 3 Two errors; Double Errors Detection algorithm could be used to detect errors. But unable to correct. This has been around for a long time (1968)– time to look again
RRNS Core Microarchitecture 50% overhead (<< 200% !) 100x more power efficient B. Deng, et al., “Computationally-redundant energy-efficient processing for y'all (CREEPY),” Proceedings of the 2016 IEEE International Conference on Rebooting Computing (ICRC), (San Diego, CA), Oct. 17-19, 2916. 29
Superconducting: sm aller, low er pow er, sam e perform ance same scale comparison 2’ x 2’ Supercomputer Titan at ORNL - #2 of Top500 Superconducting Supercomputer Performance 17.6 PFLOP/s (#2 in world*) 20 PFLOP/s ~1x Memory 710 TB (0.04 B/FLOPS) 5 PB (0.25 B/FLOPS) 7x Power 8,200 kW avg. (not included: cooling, storage memory) 80 kW total power (includes cooling) 0.01x 4,350 ft 2 (404 m 2 , not including cooling) ~ 200 ft 2 (includes cooling) Space 0.05x Cooling additional power, space and infrastructure required All cooling shown Courtesy of M. Manheimer, IARPA Cryogenic Computing Complexity (C3) Program 30
MIT-LL Fully-Planarized Nb Josephson Junction Process Target 10-Nb-layer Process Process Features • Nb/AlOx/Nb JJ technology • 10 kA/cm 2 (100 µ A/ µ m 2 ) baseline • 200-mm Si substrates • 4-, 8- &10-Nb layer nodes • Feature sizes to 500 nm • Full planarization for uniformity • Transition to stacked/stud vias SFQ-4ee (8-Nb-layer) 2 µ m 31
Recommend
More recommend