A Growth- A Growth -Factor of a Billion Factor of a Billion in Performance in a Career in Performance in a Career Super Scalar/Vector/Parallel 1 PFlop/s (10 15 ) IBM Parallel BG/L An Overview of An Overview of ASCI White ASCI Red Pacific 1 TFlop/s (10 12 ) Supercomputers, Clusters and Supercomputers, Clusters and TMC CM-5 Cray T3D 2X Transistors/Chip Vector Grid Grid TMC CM-2 Every 1.5 Years Cray 2 1 GFlop/s Cray X-MP (10 9 ) Super Scalar Jack Dongarra Cray 1 University of Tennessee 1941 1 (Floating Point operations / second, Flop/s) CDC 7600 1945 100 IBM 360/195 1 MFlop/s Scalar 1949 1,000 (1 KiloFlop/s, KFlop/s) and (10 6 ) 1951 10,000 CDC 6600 1961 100,000 1964 1,000,000 (1 MegaFlop/s, MFlop/s) Oak Ridge National Laboratory IBM 7090 1968 10,000,000 1975 100,000,000 1987 1,000,000,000 (1 GigaFlop/s, GFlop/s) 1992 10,000,000,000 1993 100,000,000,000 1 KFlop/s 1997 1,000,000,000,000 (1 TeraFlop/s, TFlop/s) (10 3 ) UNIVAC 1 2000 10,000,000,000,000 EDSAC 1 2003 35,000,000,000,000 (35 TFlop/s) 02 3/18/2005 1 2 1950 1960 1970 1980 1990 2000 2010 TOP500 Performance – – November 2004 November 2004 TOP500 Performance H. Meuer, H. Simon, E. Strohmaier, & JD H. Meuer, H. Simon, E. Strohmaier, & JD 1. 127 PF/ s - Listing of the 500 most powerful 1 Pflop/ s IBM Computers in the World BlueGene/ L 100 Tflop/ s SUM 70. 72 TF/ s NEC - Yardstick: Rmax from LINPACK MPP 10 Tflop/ s Earth Simulator 1. 167 TF/s N=1 IBM ASCI Whit e Ax=b, dense problem 1 Tflop/ s LLNL 850 GF/ s TPP performance 59.7 GF/s Int el ASCI Red Sandia 100 Gflop/ s Rate Fuj it su 10 Gflop/ s 'NWT' NAL N=500 My Laptop - Updated twice a year 0. 4 GF/ s Size 1 Gflop/ s SC‘xy in the States in November 100 Mflop/ s 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Meeting in Mannheim, Germany in June - All data available from www.top500.org 02 02 3 4 Vibrant Field for High Performance Vibrant Field for High Performance Architecture/Systems Continuum Architecture/Systems Continuum Computers Computers Tightly 100% Coupled ♦ Cray X1, XD1, XT3 ♦ Coming soon … Custom processor Best processor performance for ♦ ♦ Custom codes that are not “cache with custom interconnect ♦ SGI Altix � Cray BlackWidow 80% friendly” � Cray X1 ♦ IBM Regatta Good communication performance � Galactic Computing ♦ NEC SX-8 � Simplest programming model ♦ IBM Blue Gene/L ♦ IBM Regatta � Steve Chen � 60% Most expensive IBM Blue Gene/L ♦ � ♦ IBM eServer Commodity processor Hybrid ♦ ♦ Sun with custom interconnect Good communication performance ♦ 40% ♦ HP SGI Altix Good scalability � ♦ � Intel Itanium 2 ♦ Dawning Cray XT3 (Red Storm) � ♦ Bull NovaScale � AMD Opteron 20% Commodity processor ♦ ♦ Lanovo Best price/performance (for ♦ with commodity interconnect Commod codes that work well with caches ♦ Fujitsu PrimePower Clusters � 0% and are latency tolerant) � Pentium, Itanium, ♦ Hitachi SR11000 J u n -9 3 D e c -9 3 J u n -9 4 D e c -9 4 J u n -9 5 D e c -9 5 J u n -9 6 D e c -9 6 J u n -9 7 D e c -9 7 J u n -9 8 D e c -9 8 J u n -9 9 D e c -9 9 J u n -0 0 D e c -0 0 J u n -0 1 D e c -0 1 J u n -0 2 D e c -0 2 J u n -0 3 D e c -0 3 J u n -0 4 More complex programming model ♦ Opteron, Alpha ♦ NEC SX-8 � GigE, Infiniband, ♦ Apple Myrinet, Quadrics Loosely NEC TX7 � IBM eServer Coupled � 02 02 Dawning � 5 6 1
Architectures / Systems Processor Types Architectures / Systems Processor Types 500 500 S IMD S IMD 400 400 Vector S ingle Proc. S calar 300 300 Cluster S parc 200 MIPS Constellation 200 intel S MP 100 HP 100 MPP Power 0 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 0 Alpha 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 02 02 7 8 Top500 Performance by Manufacture (11/04) Commodity Processors Commodity Processors Sun Cray Hitachi 0% 2% 1% Fujitsu 2% Intel NEC 0% 4% ♦ HP PA RISC ♦ Intel Pentium Nocona SGI 7% ♦ Sun UltraSPARC IV � 3.6 GHz, peak = 7.2 Gflop/s � Linpack 100 = 1.8 Gflop/s ♦ HP Alpha EV68 � Linpack 1000 = 3.1 Gflop/s others � 1.25 GHz, 2.5 Gflop/s 14% IBM 49% peak ♦ AMD Opteron ♦ MIPS R16000 � 2.2 GHz, peak = 4.4 Gflop/s � Linpack 100 = 1.3 Gflop/s HP � Linpack 1000 = 3.1 Gflop/s 21% ♦ Intel Itanium 2 � 1.5 GHz, peak = 6 Gflop/s � Linpack 100 = 1.7 Gflop/s 02 02 � Linpack 1000 = 5.4 Gflop/s 9 10 Commodity Interconnects Interconnects / Systems Commodity Interconnects Interconnects / Systems 500 ♦ Gig Ethernet Others ♦ Myrinet 400 Infiniband Clos ♦ Infiniband Quadrics 300 Gigabit Ethernet ♦ QsNet F Cray Int erconnect a t 200 t r e e ♦ SCI Myrinet S P S witch 100 Torus Cost Cost Cost MPI Lat / 1-way / Bi-Dir Crossbar Switch topology NIC Sw/node Node (us) / MB/s / MB/s Gigabit Ethernet Bus $ 50 $ 50 $ 100 30 / 100 / 150 0 N/ A SCI Torus $1,600 $ 0 $1,600 5 / 300 / 400 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 QsNetII (R) Fat Tree $1,200 $1,700 $2,900 3 / 880 / 900 QsNetII (E) Fat Tree $1,000 $ 700 $1,700 3 / 880 / 900 Myrinet (D card) Clos $ 595 $ 400 $ 995 6.5 / 240 / 480 Myrinet (E card) Clos $ 995 $ 400 $1,395 6 / 450 / 900 02 02 IB 4x Fat Tree $1,000 $ 400 $1,400 6 / 820 / 790 11 12 2
Recommend
More recommend