Panel Session: Amending Moore’s Law for Embedded Applications James C. Anderson MIT Lincoln Laboratory HPEC04 29 September 2004 This work is sponsored by the HPEC-SI (high performance embedded computing software initiative) under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government. Reference to any specific commercial product, trade name, trademark or manufacturer does not constitute or imply endorsement. MIT Lincoln Laboratory 000523-jca-1 KAM 10/7/2004
Objective, Questions for the Panel & Schedule Objective: identify & characterize factors that affect the impact • of Moore’s Law on embedded applications Questions for the panel • 1). Moore’s Law: what’s causing the slowdown? – 2). What is the contribution of Moore’s Law to improvements at – the embedded system level? 3). Can we preserve historical improvement rates for embedded – applications? Panel members & audience may hold diverse, evolving opinions Schedule • 1540-1600: panel introduction & overview – 1600-1620: guest speaker Dr. Robert Schaller – 1620-1650: panelist presentations – 1650-1720: open forum – 1720-1730: conclusions & the way ahead – MIT Lincoln Laboratory 000523-jca-2 KAM 10/7/2004
Panel Session: Amending Moore’s Law for Embedded Applications Moderator: Dr. James C. Anderson, MIT Lincoln Laboratory Dr. Richard Linderman, Air Force Research Laboratory Dr. Mark Richards, Georgia Institute of Technology Mr. David Martinez, MIT Lincoln Laboratory Dr. Robert R. Schaller, College of Southern Maryland MIT Lincoln Laboratory 000523-jca-3 KAM 10/7/2004
Four Decades of Progress at the System Level 1965 Gordon Moore publishes Computers lose badly at chess “Cramming more components onto integrated circuits” MIT Lincoln Laboratory 000523-jca-4 KAM 10/7/2004
Four Decades of Progress at the System Level 1965 1997 Gordon Moore publishes “Cramming more components onto integrated circuits” Computers lose Robert Schaller Deep Blue badly at chess publishes “Moore’s Law: (1270kg) beats chess past, present and future” champ Kasparov MIT Lincoln Laboratory 000523-jca-5 KAM 10/7/2004
Four Decades of Progress at the System Level 1965 1997 2002- 2004 Robert Schaller Gordon Moore Mark Richards publishes “Moore’s Chess champ publishes (with Gary Shaw) Law: past, present Kramnik ties Deep “Cramming more and future” publishes Fritz & Kasparov components onto “Sustaining the ties Deep Junior integrated Deep Blue exponential (10K lines C++ circuits” (1270kg) beats growth of running on 15 GIPS chess champ Kasparov embedded digital Computers lose server using 3 badly at chess signal Gbytes) processing ~2008 capability” MIT Lincoln Laboratory 000523-jca-6 KAM 10/7/2004
Four Decades of Progress at the System Level 1965 1997 2002- ~2005 2004 Mark Richards (with Gary Shaw) publishes “Sustaining the Robert Schaller exponential growth of Gordon Moore publishes “Moore’s embedded digital signal publishes Law: past, present processing capability” “Cramming more and future” Deep Dew hand- components onto held chess champ integrated Deep Blue (0.6L & 0.6kg) circuits” (1270kg) beats chess champ uses 22 AA cells Kasparov Computers lose (Li/FeS 2 , 22W for badly at chess 3.5 hrs) & COTS parts incl. voice Chess champ Kramnik ties Deep Fritz & I/O chip Kasparov ties Deep Junior (10K lines C++ running on 15 GIPS server using 3 Gbytes) MIT Lincoln Laboratory 000523-jca-7 KAM 10/7/2004
Four Decades of Progress at the System Level 1965 1997 2002- ~2005 2004 Mark Richards (with Gary Deep Dew hand-held Shaw) publishes chess champ (0.6L & “Sustaining the 0.6kg) uses 22 AA cells Robert Schaller exponential growth of (Li/FeS 2 , 22W for 3.5 Gordon Moore publishes “Moore’s embedded digital signal hrs) & COTS parts incl. publishes Law: past, present processing capability” voice I/O chip “Cramming more and future” components onto integrated Deep Blue circuits” (1270kg) beats chess champ Kasparov Computers lose badly at chess Chess champ Kramnik ~2008 ties Deep Fritz & Deep Yogurt has 1/3 the Kasparov ties Deep size & power of Deep Junior (10K lines C++ Dew , with 3X running on 15 GIPS improvement in 3 yrs server using 3 Gbytes) MIT Lincoln Laboratory 000523-jca-8 KAM 10/7/2004
Power per Unit Volume (Watts/Liter) for Representative Systems ca. 2003 1.00E+08 Throughput in GIPS (billions of Com putation Density (GIPS/Liter) 1.00E+07 Dhrystone instructions/sec) 1.00E+06 1.00E+05 d e l o o ) d c e - n l o o o i t c c - 1.00E+04 e n v o n i t o c c u Hand-held unit feasible with d r o n f o COTS parts 4Q03, but not built c t i m r o i l f L l 1.00E+03 a / W c i p 0 y 7 t ( s d r a c 1.00E+02 Deep Fritz & 1.00E+01 Deep Junior Chess Server 1.00E+00 1.00E-01 1.00E-02 1.00E-01 1.00E+00 1.00E+01 1.00E+02 1.00E+03 Computation Efficiency (GIPS/watt) MIT Lincoln Laboratory 000523-jca-9 KAM 10/7/2004
Power per Unit Volume (Watts/Liter) for Representative Systems ca. 2003 1.00E+08 Throughput in GIPS (billions of Com putation Density (GIPS/Liter) 1.00E+07 Dhrystone instructions/sec) 1.00E+06 1.00E+05 d e l o o ) d c e - n l o o o i t c c - 1.00E+04 e n v o n i t o c c u Hand-held unit feasible with d r o n f o COTS parts 4Q03, but not built c t i m r 1.6 W/L moderately active human o i l f L l 1.00E+03 a (human vs. machine “Turing Tests”) / W c i p 0 y 7 t ( s d r a c 1.00E+02 Chess Deep Fritz & 1.00E+01 champs’ Deep Junior brains Chess Server 1.00E+00 Kramnik & Human chess champs Deep Fritz Kramnik & Kasparov 1.00E-01 1.00E-02 1.00E-01 1.00E+00 1.00E+01 1.00E+02 1.00E+03 Computation Efficiency (GIPS/watt) MIT Lincoln Laboratory 000523-jca-10 KAM 10/7/2004
Power per Unit Volume (Watts/Liter) for Representative Systems ca. 2003 1.00E+08 Active die volume RBMK-1500 reactor Throughput in Computation Density (GIPS/Liter) (1µm depth) GIPS (billions of 1.00E+07 Dhrystone instructions/sec) 1.00E+06 8000 Watts/Liter nuclear reactor core Die PowerPC 750FX 1.00E+05 (1mm thick) 70 W/L limit for convection-cooled (0.13µm, 800 MHz) cards (typical for conduction-cooled) 1.00E+04 Hand-held unit feasible with Packaged COTS parts 4Q03, but not built 1.6 W/L moderately active human device 1.00E+03 (human vs. machine “Turing Tests”) 1.00E+02 Chess Deep Fritz & 1.00E+01 champs’ Deep Junior brains Computer Chess Server card 1.00E+00 Kramnik & Human chess champs Deep Fritz Kramnik & Kasparov 1.00E-01 1.00E-02 1.00E-01 1.00E+00 1.00E+01 1.00E+02 1.00E+03 Computation Efficiency (GIPS/watt) MIT Lincoln Laboratory 000523-jca-11 KAM 10/7/2004
System-level Improvements Falling Short of Historical Moore’s Law 10000 GFLOPS (billions of 32 bit floating-point operations/sec) sustained for 1K complex FFT Computation Density (GFLOPS/Liter) using 6U form factor convection-cooled COTS multiprocessor cards <55W, 2Q04 data 1000 2010 Special-purpose ASIC, 6/10 SRAM-based FPGA, 2/09 Moore’s Law slope: General-purpose RISC 100 with on-chip vector 100X in 10 yrs processor, 2/10 RISC 3/00 10 Y2K FPGA COTS ASIC & FPGA improvements 3/99 ASIC outpacing general-purpose 7/99 processors, but all fall short of historical Moore’s Law 1 0.01 0.10 1.00 10.00 100.00 Computation Efficiency (GFLOPS/Watt) MIT Lincoln Laboratory 000523-jca-12 KAM 10/7/2004
Timeline for ADC Sampling Rate & COTS Processors (2Q04) 10000 10,000 Pair of analog-to-digital 3X in 3yrs converters provide data to processor card for 32 bit SRAM-based FPGAs Special- purpose ASICs floating-point 1K complex FFT 1000 1000 Highest-performance 6U 2X in 3 yrs Rate (MSPS) form factor multiprocessor cards <55W s C Projections assume D A t i b 100 future commercial - 100 4 1 Moore’s Law slope: o t market for 1 GSPS 12- - & 2 1 P bit ADCs & 50 S ) D r o , s GFLOPS cards with 8 P 4X in 3 yrs s µ e c e o Gbytes/sec I&O s r o p p r r o u p t c - e l 10 a 10 v r e / w n ( Open systems architecture goal: mix old & e G C S I R new general- & special-purpose cards, with upgrades as needed (from 1992-2003, a new card could replace four 3-yr-old cards) 1 1 92 00 06 08 94 96 98 02 04 10 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 1 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 0 Year 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 / / / / / / / / / / / / / / / / / / / 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 / / / / / / / / / / / / / / / / / / / 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MIT Lincoln Laboratory 000523-jca-13 KAM 10/7/2004
Recommend
More recommend