today s world wide today s world wide computing grid for
play

Today's World-wide Today's World-wide Computing Grid for the - PowerPoint PPT Presentation

Today's World-wide Today's World-wide Computing Grid for the Computing Grid for the Computing Grid for the Computing Grid for the Lar Large Hadron Collider g e Hadron Collider (WLCG): (WLCG): A A P t A A Petasca P t ascale Fac l F


  1. Today's World-wide Today's World-wide Computing Grid for the Computing Grid for the Computing Grid for the Computing Grid for the Lar Large Hadron Collider g e Hadron Collider (WLCG): (WLCG): A A P t A A Petasca P t ascale Fac l F acilit ility - ilit ilit Moving to Moving to Exascale Moving to Moving to Exascale Exascale? Exascale? Sverre Jarp, CERN openlab CTO 18 May 2011 y

  2. Agenda Agenda  Q i k  Quick overview of CERN and the i f CERN d th Large Hadron Collider  Computing by the LHC experiments  CERN openlab and future R&D  CERN openlab and future R&D  Conclusions 2

  3. 3 CERN and LHC CERN and LHC

  4. What is What is CERN? CERN? • CERN is the world's largest particle physics centre CERN is also: • Particle physics is about: - 2250 staff - elementary particles, the constituents all (physicists, engineers, matter in the Universe is made of technicians technicians, …) ) - fundamental forces which hold matter - Some 10’000 visiting together scientists (most of • Particles physics requires: the world's particle - special tools to create and study new particles physicists) - Accelerators They come from -Particle Detectors 500 universities representing representing -Powerful computer systems f 80 nationalities. 4 Intel

  5. What is What is the LHC? the LHC? • The Large Hadron Collider can collide beams of Four experiments, protons at a design energy of 2 * 7 TeV with detectors as ‘big as • Inaugurated Sept. 2008; restart Nov. 2009 Inaugurated Sept. 2008; restart Nov. 2009 cathedrals’: • Reached 3.5 TeV (March 2010) ALICE • 2011/12: Two years at 3.5 TeV before ATLAS upgrade pg CMS LHCb • Using the latest super-conducting technologies, it operates at about – 271 º C, just above the temperature of absolute zero. The coldest place in the Universe. • With its 27 km circumference, the accelerator is the largest superconducting installation in the world. 27 March 2006 5

  6. Collisions at LHC 6

  7. ATLAS ATLAS  General purpose LHC detector – 7000 tons 7

  8. ATLAS under construction (2005) ATLAS under construction (2005)  Picture taken in 2005: Picture taken in 2005:

  9. Compact Muon Compact Muon Solenoid olenoid (CMS – ( CMS – 12500 tons 12500 tons) ) 9

  10. 10 eV CMS event @ 3.5 TeV CMS event @ 3.5

  11. 11 A CMS collision A CMS collision

  12. 12 LHC Computing LHC Computing

  13. Data Handling and Computation for Data Handling and Computation for Physics Analysis Physics Analysis Online Online Online trigger detector Selection & and filtering reconstruction Offline Reconstruction Processed Event Event data summary data 10% Raw 100% 100% Batch Batch data data physics Event analysis 1% reprocessing Offline Analysis w/ROOT Event Analysis objects simulation ( (extracted by physics topic) ) Offline Simulation w/GEANT4 /GEANT4 Interactive physics analysis 13

  14. HEP programming paradigm HEP programming paradigm  All events are independent  Trivial parallelism has been exploited by T i i l ll li h b l it d b High Energy Physics for decades  Compute one event after the other in a single  Compute one event after the other in a single process  Advantage:  Large jobs can be split into N efficient processes, each responsible for processing M events • Built-in scalability  Disadvantage:  Memory needed by each process • With 2 – 4 GB per process • A dual-socket server with Octa-core processors – Needs 32 – 64GB

  15. Rationale for Grids  The LHC Computing requirements are simply too huge for a single site: huge for a single site:  Impractical to build such a huge facility in one place  Modern wide-area networks have made distances shrink • But, latency still has to be kept in mind  The users are not necessarily at CERN  Political resistance to funding everything at CERN  P liti l i t t f di thi t CERN  So, we are spreading the burden! CERN CERN 12% 18% A ll Tier-2s CERN 33% 33% 34% 34% A ll Tier-2s 43% A ll Tier-1s 66% A A ll Tier 1s llTier-1s A A ll Tier-1s llTi 1 39% 55% CPU Disk Tape 15

  16. World-wide LHC Computing Grid World-wide LHC Computing Grid  W-LCG: Largest Grid service in the world •Built on t Built on top p of of EGEE and EGEE and OSG OSG • Almos Almost 1 160 60 sit sites in 34 s in 34 countries countries • More than More than 250’000 250’000 IA 250’000 250’000 IA IA IA processor cores processor cores (w/Lin (w/Linux) ux) • One hundre One hundred d pet petabytes of f st storage st storage 16

  17. Excellent 10 Gb W-LCG connectivity Excellent 10 Gb W-LCG connectivity T2 Tier-2 and Tier-1 sites are inter-connected by T2 the general purpose research networks T2 Germany France Canada T2 T2 USA T2 Any Tier-2 may Taiwan access data at any Tier-1 any Tier 1 T2 T2 USA Nordic T2 United Kingdom Italy T2 Netherlands Spain T2 17 T2 17 T2

  18. First year of LHC data (Tier0 and Grid)  Impressive numbers, we believe! Stored ~ 15 PB in 2010 Writing up to 70 TB / day to tape Writing up to 70 TB / day to tape (~ 70 tapes per day) Data written to tape (GB/day) D t itt t t (GB/d ) 4.50E+07 4 50E+07 Jobs run / month 4.00E+07 1 M jobs/day 3.50E+07 LHCb 3.00E+07 CMS 2.50E+07 2 50E 07 ATLAS ALICE 2.00E+07 1.50E+07 1.00E+07 5.00E+06 0.00E+00 Jan ‐ 08 Feb ‐ 08 Mar ‐ 08 Apr ‐ 08 May ‐ 08 Jun ‐ 08 Jul ‐ 08 Aug ‐ 08 Sep ‐ 08 Oct ‐ 08 Nov ‐ 08 Dec ‐ 08 Jan ‐ 09 Feb ‐ 09 Mar ‐ 09 Apr ‐ 09 May ‐ 09 Jun ‐ 09 Jul ‐ 09 Aug ‐ 09 Sep ‐ 09 Oct ‐ 09 Nov ‐ 09 Dec ‐ 09 Jan ‐ 10 Feb ‐ 10 Mar ‐ 10 Apr ‐ 10 May ‐ 10 Jun ‐ 10 Jul ‐ 10 Aug ‐ 10 Sep ‐ 10 Oct ‐ 10 Nov ‐ 10 Dec ‐ 10 Jan ‐ 11 Feb ‐ 11 Mar ‐ 11 Apr ‐ 11 18

  19. CERN’s of CERN’s offline fline capacity capacity • High-throughput computing based on reliable “commodity” technology: • Scientific Linux • All inclusive: 7’800 dual-socket servers (64’000 cores) • Disk storage: 63’000 TB (usable) on 64’000 drives • Tape storage: 34’000 TB on 45’000 cartridges p g g • 56’000 slots and 160 drives 19

  20. Computer Centre Even CERN has a power problem Even CERN has a power problem We are going to move from 2.9 MW to 3.5 MW. Beyond this we will establish a remote Tier-0 in 2013! y 20

  21. W-LCG: A distributed supercomputer W-LCG: A distributed supercomputer  Compared to TOP10 (Nov. 10) Name/Location Name/Location Core count ore count Tianhe-1 (Tianjin) 186’368 Jaguar (Oak Ridge) 224’162 W-LCG Nebulae – Dawning (NSCS) 120’640 250’000 250’000 Tsubame 2.0 (GSIC, Tokyo) 73’278 IA cores Hopper (DOE/NERSC) 153’408 Tera -100 – Bull (CEA) 138’368 Roadrunner (DOE/LANL) 122’400 Kraken XT5 (Tennessee) Kraken XT5 (Tennessee) 98’928 98 928 Jugene (Jülich) 294’912 Cielo (DOE/SNL) Cielo (DOE/SNL) 107’152 107 152 21

  22. Insatiable appetite for computing  During the era of the LEP accelerator (and beyond)  Compute power doubled every year  We are desperately looking at all opportunities for this to continue 22

  23. CERN openlab  IT Department’s main R&D focus  Framework for collaboration with industry  Evaluation integration validation  Evaluation, integration, validation  of cutting-edge technologies that can serve the LHC Computing Grid  Sequence of 3-year agreements  2003 – 2005: Phase I: the “opencluster” project  2006 – 2011: Phase II & III: dedicated Competence Centres WLCG nlab 0 openlab I openlab II openlab III openlab IV openlab V Other CERN entities Other CERN entities Jan15 Jan03 Jan03 05 05 07 07 09 09 11 11 13 13 10 years of existence 23

  24. CERN openlab structure  A solid set of Competence Centres  With strong support from Management and Communications Automation and Controls CC (Siemens) (Siemens) Communic C Database CC Database CC Manage (Oracle) cations ment Networking CC (HP) Platform CC (Intel) (Intel) 24

  25. EXASCALE Capacity Computing R&D  In openlab, we want to start an R&D In openlab, we want to start an R&D project for Exascale  Project goals:  Identify constraints which might inhibit growth in CERN’s Tier0 and in the W-LCG in the future future.  Understand which software and hardware components must be moved towards the components must be moved towards the Exascale range. 25

  26. Intel’s “Many Integrated Core” Architecture  Announced at ISC10 (June 2010)  S. Jarp on stage with K.Skaugen/Intel  Current version (codename “Knights Ferry SDP”)  Enhanced x86 instruction set with vector extensions  Enhanced x86 instruction set with vector extensions  32 cores + 4-way multithreaded + 512-bit vector units  Successful (easy) porting of our benchmark applications Successful (easy) porting of our benchmark applications  ALICE Trackfitter/Trackfinder  Multithreaded Geant4 prototype p yp  Maximum Likelihood data analysis prototype cs: INTEL Graphic 26

Recommend


More recommend