physics computing at cern
play

Physics Computing at CERN Helge Meinhard CERN, IT Department - PowerPoint PPT Presentation

Physics Computing at CERN Helge Meinhard CERN, IT Department OpenLab S tudent Lecture 21 July 2011 Location (1) Building 513 (opposite of restaurant no. 2) Building 513 (1) Large building with 2700 m 2 surface for computing equipment,


  1. Physics Computing at CERN Helge Meinhard CERN, IT Department OpenLab S tudent Lecture 21 July 2011

  2. Location (1) Building 513 (opposite of restaurant no. 2)

  3. Building 513 (1) Large building with 2700 m 2 surface for computing equipment, capacity for 2.9 MW electricity and 2.9 MW air and water cooling Chillers Transformers 21 July 2011 Physics computing - Helge Meinhard 3

  4. Building 513 (2) – Ongoing Work  Reasons for the upgrade:  Cooling:  Insufficient cooling for critical UPS room  CC not cooled when running on UPS without diesel  Insufficient cooling when running on diesel  Pumps & Vent ilat ion Units running but no chiller and insufficient st ored cold wat er  Power  Insufficient critical power available  No redundancy for critical UPS (> 240 kW) – currently running at 340kW  No redundancy for physics UPS (> 2.7 MW) – aiming to run at 2.9MW by end of year  Other  Limited fire protection in B513  Critical areas and physics areas strongly coupled  S hare t he same locat ions, cooling infrast ructure and fire risks 21 July 2011 Physics computing - Helge Meinhard 4

  5. Building 513 (3) - Ongoing Work  S cope of the upgrade:  Dedicated cooling infrastructure for critical equipment (decoupled from physics)  New building for cooling system (construction that is underway in front of the building)  New dedicated room for critical equipment, new electrical rooms and critical ventilation systems in ‘ Barn’  Critical equipment which cannot be moved to new rooms to have new dedicated cooling  Networking area and telecoms rooms  Increase in critical UPS power to 600kW (with new critical UPS room) and overall power to 3.5MW  Restore N+1 redundancy for all UPS systems 21 July 2011 Physics computing - Helge Meinhard 5

  6. Building 513 (4) – Ongoing Work 21 July 2011 Physics computing - Helge Meinhard 6

  7. Location (2)  Building 613: S  Hosting centre about mall machine room for 15 km from CERN: 35 m 2 , about 100 kW, tape libraries (about 200 m from building critical equipment 513) 21 July 2011 Physics computing - Helge Meinhard 7

  8. Computing Service Categories Two coarse grain computing categories Computing infrastructure Physics data flow and and administrative computing data processing 21 July 2011 Physics computing - Helge Meinhard 8

  9. Task overview  Needs underlying  Communicat ion t ools: mail, Web, Twiki, GS M, infrastructure …  Network and telecom  Product ivit y t ools: equipment office software, software  Processing, storage development, compiler, and database visualization tools, computing equipment engineering software, …  Management and  Comput ing capacit y: monitoring software CPU processing, data  Maintenance and repositories, personal operations storage, software  Authentication and repositories, metadata repositories, … security 21 July 2011 Physics computing - Helge Meinhard 9

  10. CERN CC currently (June 2011)  Data Centre Operations (Tier 0)  24x7 operator support and S ystem Administration services to support 24x7 operation of all IT services.  Hardware installation & retirement  ~7,000 hardware movement s/ year; ~1800 disk failures/ year  Management and Automation framework for large scale Linux clusters 10

  11. Infrastructure Services Software environment and productivity tools User registration and authentication 22’000 registered users Web services Mail 10’000 web sites 2 million emails/day, 99% spam 18’000 mail boxes Tool accessibility Windows, Office, CadCam, … Home directories (DFS, AFS) ~400 TB, backup service ~ 2 billion files PC management Software and patch installations Infrastructure needed : 11 > 400 servers

  12. Network Overview ATLAS Experiments All CERN buildings 12’000 active users Central, high speed network backbone 12 Computer centre World Wide Grid centres processing clusters

  13. Monitoring  Large scale monitoring  S urveillance of all nodes in the computer centre  Hundreds of parameters in various time intervals, from minutes to hours, per node and service  Data base storage and Interactive visualisation

  14. Bookkeeping: Database Services  More than 125 ORACLE  LHC machine data base instances on paramet ers > 400 service nodes, total ~ 100 TB  Human resource  Bookkeeping of physics informat ion event s for t he experiment s  Financial bookkeeping  Met a dat a for t he  Mat erial bookkeeping physics event s (e.g. and mat erial flow det ect or condit ions)  Management of dat a cont rol processing  LHC and det ect or  Highly compressed and const ruct ion det ails filt ered event dat a  …  … 21 July 2011 Physics computing - Helge Meinhard 14

  15. HEP analyses  S tatistical quantities over many collisions  Histograms  One event doesn’ t prove anything  Comparison of statistics from real data with expectations from simulations  S imulations based on known models  S tatistically significant deviations show that the known models are not sufficient  Need more simulated data than real data  In order to cover various models  In order to be dominated by statistical error of real data, not simulation 21 July 2011 Physics computing - Helge Meinhard 15

  16. detector Data Handling and Computation for Physics Analyses event filter (selection & reconstruction) reconstruction processed event data summary data raw data batch physics event analysis reprocessing analysis analysis objects (extracted by physics topic) les.robertson@cern.ch event simulation simulation interactive physics analysis

  17. Data Flow – online Detector 150 million electronics channels 1 PBytes/s Fast response electronics, Level 1 Filter and Selection FPGA, embedded processors, Limits: very close to the detector Essentially the 150 GBytes/s budget and the downstream data flow pressure O(1000) servers for processing, High Level Filter and Selection Gbit Ethernet Network N x 10 Gbit links to 0.6 GBytes/s the computer centre CERN computer centre 21 July 2011 17

  18. Data Flow – offline 1000 million events/s LHC 4 detectors Filter and first selection 1…2 GB/s Create sub-samples World-wide analysis Physics Explanation of nature 10 GB/s 3 GB/s Store on disk and tape Γ Export copies 2 s σ ≈ σ × 0 Z with + Γ _ _ 2 2 2 2 2 ( s - m ) s / m f f f f Z Z z π Γ Γ 3 12 G m σ = Γ = × + × 0 2 2 ee ff F Z and ( v a ) N _ Γ ff π f f col 2 2 m 6 2 f f Z Z 21 July 2011 Physics computing - Helge Meinhard 18

  19. SI Prefixes Source: wikipedia.org

  20. Data Volumes at CERN  Each year: 15  Compare wit h (numbers from mid 2010): Petabytes  Library of Congress: 200 TB  Tower of CDs: which  E-mail (w/ o spam): 30 PB height? 30 t rillion mails at 1 kB each  S  Phot os: 1 EB tored cumulatively 500 billion phot os at 2 MB each over LHC running  50 PB on Facebook  Web: 1 EB  Only real data and  Telephone calls: 50 EB derivatives  S imulated data not … growing exponent ially… included  Total of simulated data 20 even larger

  21. Physical and Logical Connectivity Complexity / scale Hardware Software Components CPU, disk, memory, mainbord Operating system, device drivers CPU, disk server Network, interconnects Resource Cluster, management local fabric software Wide area network Grid and cloud World-wide management cluster software 21 July 2011 21

  22. Computing Building Blocks Commodity market components: Tape server = not cheap, but cost effective! Simple components, but many of them CPU server + fibre channel connection CPU server or worker node: + tape drive dual CPU, quad core, 16 or 24 GB memory Disk server = CPU server + RAID controller Market trends more important + 24 SATA disks than technology trends Always watch TCO: Physics computing - Helge Meinhard 22 Total Cost of Ownership

  23. Hardware Management  Almost 12’ 000 servers installed in centre  Assume 3...4 years lifetime for the equipment  Key factors: power efficiency, performance, reliability  Demands by experiments require investments of ~ 15 MCHF/ year for new PC hardware and infrastructure  Infrastructure and operation setup needed for  ~3’ 500 nodes installed per year  ~3’ 500 nodes removed per year  Installation in racks, cabling, automatic installation, Linux software environment 21 July 2011 Physics computing - Helge Meinhard 23

  24. Functional Units Detectors Data import and export Disk storage Tape storage ‘Active’ archive and backup Event processing capacity Meta-data storage CPU server Data bases 21 July 2011 Physics computing - Helge Meinhard 24

Recommend


More recommend