Planned Developments of Planned Developments of Hi h E d S Hi h E d S High End Systems High End Systems t t Around the World Around the World Jack Dongarra INNOVATIVE COMP ING LABORATORY University of Tennessee Oak Ridge National Laboratory University of Manchester 1/17/2008 1 Planned Development of HPC Planned Development of HPC • Quick look at current state of HPC through the “eyes” of the Top500 • The Japanese Efforts • The European Initiatives • The state of China’s HPC • India’s machine 1
H. Meuer, H. Simon, E. Strohmaier, & JD H. Meuer, H. Simon, E. Strohmaier, & JD - Listing of the 500 most powerful Computers in the World Computers in the World - Yardstick: Rmax from LINPACK MPP Ax=b, dense problem TPP performance Rate - Updated twice a year Updated twice a year Size SC‘xy in the States in November Meeting in Germany in June - All data available from www.top500.org 3 Performance Development 6.96 PF/s 1 Pflop/ s IBM BlueGene/L 478 TF/s 100 Tflop/ s SUM 10 Tflop/ s NEC Earth Simulator 5.9 TF/s N=1 1 Tflop/ s 1.17 TF/s IBM ASCI White 6-8 years 100 Gflop/ s Intel ASCI Red 59.7 GF/s 10 Gflop/ s Fujitsu 'NWT' 1 Gflop/ s 1 Gflop/ s N=500 My Laptop 0.4 GF/s 100 Mflop/ s 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 4 2
Top500 Systems November 2007 500 478 Tflop/s 450 7 systems > 100 Tflop/s 400 350 300 Rmax (Tflop/s) 250 21 systems > 50 Tflop/s 200 150 100 149 systems > 10 Tflop/s 149 systems > 10 Tflop/s 50 0 1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196 209 222 235 248 261 274 287 300 313 326 5.9 Tflop/s 339 352 365 Rank 378 391 404 417 430 (~1.3K cores w/GigE) 443 50 systems > 19 Tflop/s 456 469 482 495 5 30th Edition: The TOP10 Rmax Manufacturer Computer Installation Site Country Year #Cores [TF/s] Blue Gene/L DOE 2007 1 IBM eServer Blue Gene 478 USA 212,992 Lawrence Livermore Nat Lab Custom Dual Core .7 GHz Blue Gene/P Bl G /P 2007 2 IBM 167 Forschungszentrum Jülich Germany 65,536 Quad Core .85 GHz Custom Altix ICE 8200 Xeon SGI/New Mexico Computing 2007 3 SGI 127 USA 14,336 Applications Center Quad Core 3 GHz Hybrid 4 Cluster Platform Xeon Computational Research 2007 HP 118 India 14,240 Laboratories, TATA SONS Dual Core 3 GHz Commod Cluster Platform 2007 5 HP 102.8 Government Agency Sweden 13,728 Dual Core 2.66 GHz Commod Opteron DOE 6 2007 Cray 102.2 USA 26,569 Dual Core 2 4 GHz Dual Core 2.4 GHz Sandia Nat Lab Sandia Nat Lab Hybrid y 7 Opteron DOE 2006 Cray 101.7 USA 23,016 Dual Core 2.6 GHz Oak Ridge National Lab Hybrid 8 eServer Blue Gene/L IBM Thomas J. Watson 2005 IBM 91.2 USA 40,960 Research Center Dual Core .7 GHz Custom Opteron DOE 9 Cray 85.4 USA 19,320 2006 Dual Core 2.6 GHz Lawrence Berkeley Nat Lab Hybrid 07 eServer Blue Gene/L Stony Brook/BNL, NY Center 2006 10 IBM 82.1 USA 36,864 Dual Core .7 GHz for Computational Sciences Custom 6 3
Performance of the Top50 UK Spain Netherlands Sweden Japan Russia France 2 systems 2% 1% 1% 2 systems 2 systems Italy 2 systems 3% 3% 5% 3% 1% India Germany Germany 4% 4% Taiwan Taiwan 4 systems 1% United States 9% Germany India Sweden Spain Japan United Kingdom France Russia Russia Netherlands Italy Taiwan United States 32 systems DOE: 7 NNSA + 3 OS 67% DOE NNS DOE NNS A A • LLNL • SNL • LANL • IBM BG/ L • Red S torm Cray • RoadRunner IBM • Power PC • AMD Dual Core • AMD Dual Core • Cores: 212,992 • Cores: 27,200 • Cores: 18,252 • Peak: 596 TF • Peak: 127.5 TF • Peak: 81.1 TF • Memory: 73.7 TB • Memory: 40 TB • Memory: 27.6 TB • Thunderbird Dell • IBM Purple • Q HP • Intel Xeon • Power 5 • Alpha • Cores: 9,024 • Cores: 12,208 • Cores: 8,192 • Peak: 53 TF Peak: 53 TF • Peak: 92.8 TF Peak: 92 8 TF • Peak: 20.5 TF Peak: 20 5 TF • Memory: 6 TB • Memory: 48.8 TB • Memory: 13 TB 8 4
LANL Roadrunner LANL Roadrunner A A Petascale Petascale S S ystem ystem in in 2008 2008 “Connected Unit” cluster ≈ 13,000 Cell HPC chips 192 Opteron nodes • ≈ 1.33 PetaFlop/s (from Cell) (180 w/ 2 dual-Cell blades ≈ 7,000 dual-core Opterons connected w/ 4 PCIe x8 links) ~18 clusters 2 nd stage InfiniBand 4x DDR interconnect (18 sets of 12 links to 8 switches) (18 sets of 12 links to 8 switches) 2 nd stage InfiniBand interconnect (8 switches) Based on the 100 Gflop/s (DP) Cell chip Approval by DOE 12/07 First CU being built today Expect a May Pflop/s run Full system to LANL in December 2008 DOE OS DOE OS LBNL � ORNL � Franklin Cray XT � ANL � Jaguar Cray XT � AMD Dual Core � BG/P IBM � AMD Dual Core � Cores: 19,320 Cores: 19,320 � � � PowerPC P PC Peak: 100.4 TF Cores: 11,706 � � � Cores: 131,072 Memory: 39 TB Peak: 119.4 TF � � � Peak: 111 TF Bassi IBM � Upgrading 250 TF � PowerPC � Memory: 65.5 TB � Memory: 46 TB � Cores: 976 � Phoenix Cray X1 � Peak: 7.4 TF � Memory: 3.5 TB Cray Vector � � Seaborg IBM Seaborg IBM Cores: 1,024 Cores: 1 024 � � � � Power3 � Peak: 18.3 TF � Cores: � Memory: 2 TB � Peak: 9.9 TF � Memory: 7.3 TB � 10 5
NSF HPC Systems available on TeraGrid 10/01/2007 Does not show: LSU: Queen Bee TACC: Ranger Tennessee: Cray XT/Baker NSF - New TG systems System Peak TF/s Memory (TB) Type LSU 50.7 5.3 680n 2s 4c Dell 2.33GHz Intel Xeon Queen Bee 8-way SMP cluster; 8GB/node; IB UT-TACC 504 123 Sun Constellation - 3936n 4s 4c Ranger 2.0GHz AMD Barcelona - 16-way SMP cluster; 32GB/node; IB UTK/ORNL 164 17.8 Cray XT4 - 4456n 1s 4c AMD Track 2b Budapest (April 2008) Budapest (April 2008) Cray Baker (80,000 cores) 1,000 80 expected 2Q 09 ?? Proposals under evaluation today Track 2c To be deployed in 2011 UIUC Sustained Pflop/s Track 1 6
Japanese Efforts Japanese Efforts • TiTech Tsubame • T2K effort • Next Generation Supercomputer Effort 13 TS TS UBAME as No.1 in Japan since June 2006, UBAME as No.1 in Japan since June 2006, Sun Galaxy 4 (Opteron Dual core 8-socket) 10480core/655Nodes 32-128GB 21.4TBytes 21 4TB t 50.4TFlop/s Originally: 85 TFlop/s Today:103 TFlop/s Peak OS Linux (SuSE 9, 10) 1.1 Pbyte (now 1.6 PB) NAREGI Grid MW 4 year procurement cycle, $7 mil/y ClearSpeed CSX600 Has beaten the Earth Simulator SIMD accelerator 360 648 boards, Has beaten all the other Univ. centers combined 35 52.2TFlop/s Voltaire ISR9288 Infiniband Storage 10Gbps x2 ~1310+50 Ports 1.5PB 1.0 Pbyte (Sun “Thumper”) ~13.5Terabits/s 0.1Pbyte (NEC iStore) (3Tbits bisection) Lustre FS, NFS, CIF, WebDAV (over IP) 50GB/s aggregate I/O BW 60GB/s 14 14 7
Universities of Tsukuba, Tokyo, Kyoto Universities of Tsukuba, Tokyo, Kyoto (T2K) (T2K) • The results of the bidding announced on December 25, 2007. � The specification requires a commodity cluster with quadcore Opteron (Barcelona). • Three systems share the same architecture on each site � Based on the concept of Open Supercomputer • Open architecture, (commodity x86) • Open software, (Linux, open source) • University of Tokyo: 140 Tflop/s (peak) from Hitachi • University of Tsukuba: 95 Tflop/s (peak) from Cray Inc. • • Kyoto University: 61 Tflop/s (peak) from Fujitsu Kyoto University: 61 Tflop/s (peak) from Fujitsu • They will be installed in summer 2008. • Individual procurement : Not a single big procurement for all three systems NEC S NEC S X X- -9 Peak 839 9 Peak 839 Tflop Tflop/ s / s • 102.4 Gflop/s per cpu • 16 cpu per unit • 512 units max. • Expected ship in March 2008 • German Weather Service (DWD) • 39 TF/s, €39 M, operational in 2010. • Meteo France • sub-100 TF/s system • Tohoku University, Japan • 26 TF/s 16 8
Recommend
More recommend