Today's World-wide Today's World-wide Computing Grid for the Computing Grid for the Computing Grid for the Computing Grid for the Lar Large Hadron Collider g e Hadron Collider (WLCG): (WLCG): A A P t A A Petasca P t ascale Fac l F acilit ility - ilit ilit Moving to Moving to Exascale Moving to Moving to Exascale Exascale? Exascale? Sverre Jarp, CERN openlab CTO 18 May 2011 y
Agenda Agenda Q i k Quick overview of CERN and the i f CERN d th Large Hadron Collider Computing by the LHC experiments CERN openlab and future R&D CERN openlab and future R&D Conclusions 2
3 CERN and LHC CERN and LHC
What is What is CERN? CERN? • CERN is the world's largest particle physics centre CERN is also: • Particle physics is about: - 2250 staff - elementary particles, the constituents all (physicists, engineers, matter in the Universe is made of technicians technicians, …) ) - fundamental forces which hold matter - Some 10’000 visiting together scientists (most of • Particles physics requires: the world's particle - special tools to create and study new particles physicists) - Accelerators They come from -Particle Detectors 500 universities representing representing -Powerful computer systems f 80 nationalities. 4 Intel
What is What is the LHC? the LHC? • The Large Hadron Collider can collide beams of Four experiments, protons at a design energy of 2 * 7 TeV with detectors as ‘big as • Inaugurated Sept. 2008; restart Nov. 2009 Inaugurated Sept. 2008; restart Nov. 2009 cathedrals’: • Reached 3.5 TeV (March 2010) ALICE • 2011/12: Two years at 3.5 TeV before ATLAS upgrade pg CMS LHCb • Using the latest super-conducting technologies, it operates at about – 271 º C, just above the temperature of absolute zero. The coldest place in the Universe. • With its 27 km circumference, the accelerator is the largest superconducting installation in the world. 27 March 2006 5
Collisions at LHC 6
ATLAS ATLAS General purpose LHC detector – 7000 tons 7
ATLAS under construction (2005) ATLAS under construction (2005) Picture taken in 2005: Picture taken in 2005:
Compact Muon Compact Muon Solenoid olenoid (CMS – ( CMS – 12500 tons 12500 tons) ) 9
10 eV CMS event @ 3.5 TeV CMS event @ 3.5
11 A CMS collision A CMS collision
12 LHC Computing LHC Computing
Data Handling and Computation for Data Handling and Computation for Physics Analysis Physics Analysis Online Online Online trigger detector Selection & and filtering reconstruction Offline Reconstruction Processed Event Event data summary data 10% Raw 100% 100% Batch Batch data data physics Event analysis 1% reprocessing Offline Analysis w/ROOT Event Analysis objects simulation ( (extracted by physics topic) ) Offline Simulation w/GEANT4 /GEANT4 Interactive physics analysis 13
HEP programming paradigm HEP programming paradigm All events are independent Trivial parallelism has been exploited by T i i l ll li h b l it d b High Energy Physics for decades Compute one event after the other in a single Compute one event after the other in a single process Advantage: Large jobs can be split into N efficient processes, each responsible for processing M events • Built-in scalability Disadvantage: Memory needed by each process • With 2 – 4 GB per process • A dual-socket server with Octa-core processors – Needs 32 – 64GB
Rationale for Grids The LHC Computing requirements are simply too huge for a single site: huge for a single site: Impractical to build such a huge facility in one place Modern wide-area networks have made distances shrink • But, latency still has to be kept in mind The users are not necessarily at CERN Political resistance to funding everything at CERN P liti l i t t f di thi t CERN So, we are spreading the burden! CERN CERN 12% 18% A ll Tier-2s CERN 33% 33% 34% 34% A ll Tier-2s 43% A ll Tier-1s 66% A A ll Tier 1s llTier-1s A A ll Tier-1s llTi 1 39% 55% CPU Disk Tape 15
World-wide LHC Computing Grid World-wide LHC Computing Grid W-LCG: Largest Grid service in the world •Built on t Built on top p of of EGEE and EGEE and OSG OSG • Almos Almost 1 160 60 sit sites in 34 s in 34 countries countries • More than More than 250’000 250’000 IA 250’000 250’000 IA IA IA processor cores processor cores (w/Lin (w/Linux) ux) • One hundre One hundred d pet petabytes of f st storage st storage 16
Excellent 10 Gb W-LCG connectivity Excellent 10 Gb W-LCG connectivity T2 Tier-2 and Tier-1 sites are inter-connected by T2 the general purpose research networks T2 Germany France Canada T2 T2 USA T2 Any Tier-2 may Taiwan access data at any Tier-1 any Tier 1 T2 T2 USA Nordic T2 United Kingdom Italy T2 Netherlands Spain T2 17 T2 17 T2
First year of LHC data (Tier0 and Grid) Impressive numbers, we believe! Stored ~ 15 PB in 2010 Writing up to 70 TB / day to tape Writing up to 70 TB / day to tape (~ 70 tapes per day) Data written to tape (GB/day) D t itt t t (GB/d ) 4.50E+07 4 50E+07 Jobs run / month 4.00E+07 1 M jobs/day 3.50E+07 LHCb 3.00E+07 CMS 2.50E+07 2 50E 07 ATLAS ALICE 2.00E+07 1.50E+07 1.00E+07 5.00E+06 0.00E+00 Jan ‐ 08 Feb ‐ 08 Mar ‐ 08 Apr ‐ 08 May ‐ 08 Jun ‐ 08 Jul ‐ 08 Aug ‐ 08 Sep ‐ 08 Oct ‐ 08 Nov ‐ 08 Dec ‐ 08 Jan ‐ 09 Feb ‐ 09 Mar ‐ 09 Apr ‐ 09 May ‐ 09 Jun ‐ 09 Jul ‐ 09 Aug ‐ 09 Sep ‐ 09 Oct ‐ 09 Nov ‐ 09 Dec ‐ 09 Jan ‐ 10 Feb ‐ 10 Mar ‐ 10 Apr ‐ 10 May ‐ 10 Jun ‐ 10 Jul ‐ 10 Aug ‐ 10 Sep ‐ 10 Oct ‐ 10 Nov ‐ 10 Dec ‐ 10 Jan ‐ 11 Feb ‐ 11 Mar ‐ 11 Apr ‐ 11 18
CERN’s of CERN’s offline fline capacity capacity • High-throughput computing based on reliable “commodity” technology: • Scientific Linux • All inclusive: 7’800 dual-socket servers (64’000 cores) • Disk storage: 63’000 TB (usable) on 64’000 drives • Tape storage: 34’000 TB on 45’000 cartridges p g g • 56’000 slots and 160 drives 19
Computer Centre Even CERN has a power problem Even CERN has a power problem We are going to move from 2.9 MW to 3.5 MW. Beyond this we will establish a remote Tier-0 in 2013! y 20
W-LCG: A distributed supercomputer W-LCG: A distributed supercomputer Compared to TOP10 (Nov. 10) Name/Location Name/Location Core count ore count Tianhe-1 (Tianjin) 186’368 Jaguar (Oak Ridge) 224’162 W-LCG Nebulae – Dawning (NSCS) 120’640 250’000 250’000 Tsubame 2.0 (GSIC, Tokyo) 73’278 IA cores Hopper (DOE/NERSC) 153’408 Tera -100 – Bull (CEA) 138’368 Roadrunner (DOE/LANL) 122’400 Kraken XT5 (Tennessee) Kraken XT5 (Tennessee) 98’928 98 928 Jugene (Jülich) 294’912 Cielo (DOE/SNL) Cielo (DOE/SNL) 107’152 107 152 21
Insatiable appetite for computing During the era of the LEP accelerator (and beyond) Compute power doubled every year We are desperately looking at all opportunities for this to continue 22
CERN openlab IT Department’s main R&D focus Framework for collaboration with industry Evaluation integration validation Evaluation, integration, validation of cutting-edge technologies that can serve the LHC Computing Grid Sequence of 3-year agreements 2003 – 2005: Phase I: the “opencluster” project 2006 – 2011: Phase II & III: dedicated Competence Centres WLCG nlab 0 openlab I openlab II openlab III openlab IV openlab V Other CERN entities Other CERN entities Jan15 Jan03 Jan03 05 05 07 07 09 09 11 11 13 13 10 years of existence 23
CERN openlab structure A solid set of Competence Centres With strong support from Management and Communications Automation and Controls CC (Siemens) (Siemens) Communic C Database CC Database CC Manage (Oracle) cations ment Networking CC (HP) Platform CC (Intel) (Intel) 24
EXASCALE Capacity Computing R&D In openlab, we want to start an R&D In openlab, we want to start an R&D project for Exascale Project goals: Identify constraints which might inhibit growth in CERN’s Tier0 and in the W-LCG in the future future. Understand which software and hardware components must be moved towards the components must be moved towards the Exascale range. 25
Intel’s “Many Integrated Core” Architecture Announced at ISC10 (June 2010) S. Jarp on stage with K.Skaugen/Intel Current version (codename “Knights Ferry SDP”) Enhanced x86 instruction set with vector extensions Enhanced x86 instruction set with vector extensions 32 cores + 4-way multithreaded + 512-bit vector units Successful (easy) porting of our benchmark applications Successful (easy) porting of our benchmark applications ALICE Trackfitter/Trackfinder Multithreaded Geant4 prototype p yp Maximum Likelihood data analysis prototype cs: INTEL Graphic 26
Recommend
More recommend