RHIC Real time data reconstruction using Magellan cloud computing STAR event Magellan Cluster at NERSC 2011 OSG All Hands Jan ¡ Balewski March 7-11, 2011 for ¡ STAR ¡ Collaboration Harvard Medical School, Boston
Outline • STAR experiment at RHIC • Computing requirements for real data analysis • STAR encounters with Cloud-like computing • Deployment of real time data processing • Benefits of “instantaneous” data analysis • Summary + ... 2 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
STAR experiment at RHIC ~600 collaborators from ~50 institutions and ~12 countries RHIC 1.2 km STAR Y N , d n a l s I g n o L BNL Brookhaven National Laboratory, Upton NY, USA 3 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Explore properties of proton spin using W boson 4 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Explore properties of proton spin using W boson 4 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Explore properties of proton spin using W boson Magnetic Resonance Imaging 4 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Explore properties of proton spin using W boson “Exploring the mystery of proton spin has been one of the key scientific research goals at RHIC,” said Steven Vigdor, Brookhaven’s Associate Laboratory Director for Nuclear and Particle Physics. .... The W boson measurements [will help us] ... in quantitative understanding of proton spin structure and dynamics. ” http://www.bnl.gov/bnlweb/pubaf/pr/PR_display.asp?prID=1232 5 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Explore properties of proton spin using W boson “Exploring the mystery of proton spin has been one of the key scientific research goals at RHIC,” said Steven Vigdor, Brookhaven’s Associate Laboratory Director for Nuclear and Particle Physics. .... The W boson measurements [will help us] ... in quantitative understanding of proton spin structure and dynamics. ” http://www.bnl.gov/bnlweb/pubaf/pr/PR_display.asp?prID=1232 5 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Explore properties of proton spin using W boson Phys. Rev. Lett. 106, 062002 (2011). “Exploring the mystery of proton spin has been one of the key scientific research goals at RHIC,” said Steven Vigdor, Brookhaven’s Associate Laboratory Director for Nuclear and Particle Physics. .... The W boson measurements [will help us] ... in quantitative understanding of proton spin structure and dynamics. ” http://www.bnl.gov/bnlweb/pubaf/pr/PR_display.asp?prID=1232 5 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Registered collision of 2 protons with lot of energy proton proton Reconstruction of particles emerging from collision of two protons is a computational challenge 6 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Computational challenges at STAR for W physics Data Reconstruction Data Acquisition • reconstruction of 1 event : 10 seconds • STAR records ‘events’ at 1kHz • time to process 5GB event file: 40 hours • data rate ~1 GiB/sec • 10,000 CPUs needed for a true real event file: 5 GB with 15,000 events time event processing 7 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Computational challenges at STAR for W physics Data Reconstruction Data Acquisition • reconstruction of 1 event : 10 seconds • STAR records ‘events’ at 1kHz • time to process 5GB event file: 40 hours • data rate ~1 GiB/sec • 10,000 CPUs needed for a true real event file: 5 GB with 15,000 events time event processing Analysis requires Calibration of Detector response • Quality : crude , available within an hour Cloud ‘fastOffline’ reconstruction of 15% of events, used to computing → monitor performance of detector application • Quality : preliminary , available within a month start first data pass • Quality : final , available within 6 months full data pass over all qualified events, used for publication of results 7 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Traditional in-house data analysis model Experiment HPSS for 3 months raw data 1 kHz 8 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Traditional in-house data analysis model Experiment HPSS for 3 months raw data 1 kHz In-house raw data 300 Hz computing for 1 year farm of 2000 dual core machines running highly results customize analysis package 8 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Virtualization enables outsourcing of computation STAR Virtual Machine (VM) is born ... at first on a laptop .... 9 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Virtualization enables outsourcing of computation STAR Virtual Machine (VM) is born ... at first on a laptop .... 1) recently STAR VM is prepared at a PC at NERSC 2) pack it ‘from inside’ and ship to Amazon EC2, Magellan@NERSC, Magellan@ANL, etc.. 9 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Virtualization enables outsourcing of computation STAR Virtual Machine (VM) is born ... at first on a laptop .... 1) recently STAR VM is prepared at a PC at NERSC 2) pack it ‘from inside’ and ship to Amazon EC2, Magellan@NERSC, Magellan@ANL, etc.. • Validate once, re-use multiple times. • The same results obtained ANYWHERE → virtualization allows normalization of resources • Reproducibility of old code results rests in archived old VM, no need to retain hardware 9 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
STAR encounters with VMs calendar type of # of # jobs/ total total total days date Facility tools task VMs VM CPU input output remarks days (TB) (TB) Nimbus works like normal 2009, March Amazon EC2 simu 100 1 500 5 0 0.3 Globus globus GK grid site PBS batch use commercial 2009, November Amazon EC2 EC2 simu 10 1 or 2 1 1 0 0.01 interface GLOW Madison 2010, February CondorVM simu 430 1 130 0.6 0 0.1 call home model Uni Wisconsin Clemson Uni, Kestrel, VM lifetime 24 h, 2010, July QEMU-KVM simu 1000 1 20 0 7 17,000 no ssh to VM SC Magellan data almost real-time 2011, February Eucalyptus 20 6 or 7 600+ 20+ 2 1 reco processing NERSC GLOW Amazon NERSC EC2 STAR Clemson 10 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Largest STAR simulations (ever) at Clemson STAR MC simulations with partonic p T > 2 GeV, PYTHIA event generator ✦ Virtual Machine prepared with STAR software stack and deployed to over 1000 ✦ machines Using cloud computing at Clemson University in South Carolina (Ranked #85 best ✦ supercomputer) Over 12 billion events generated ✦ N Machines 1400 Took over 400,000 CPU hours July 2010 ✦ and generated 7 TB of data Available Machines 1200 transferred to BNL Working Machines 1000 Idle Machines Largest physics simulation on ✦ cloud, largest STAR simulation 800 in CPU hours 600 Benefit: shorten by a year PhD ✦ study of MIT student 400 200 0 Jul17 Jul24 Jul31 Date 11 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Today: Magellan @ NERSC employing VM technology to separate experiment specific requirements from facility infrastructure 12 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Today: Magellan @ NERSC 20 nodes allocated STAR employing VM technology to separate experiment specific requirements from facility infrastructure 12 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Real-time distributed processing of 2011 Data STAR experiment @BNL HPSS NERSC BNL 13 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Real-time distributed processing of 2011 Data STAR experiment @BNL clone STAR VM x 20 HPSS raw data NERSC BNL STAR VMs Magellan @ NERSC 13 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Real-time distributed processing of 2011 Data STAR experiment @BNL clone STAR VM x 20 HPSS raw data NERSC reco events BNL STAR VMs Magellan @ NERSC DB w/ time dependent detector calibration 13 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Topology of connectivity RCF ↔ VMs RCF @ BNL Magellan/Eucalyptus: 20 VM *7 jobs=140 jobs 1 job : input 5GB, duration 1-3 days NERSC RCF carver.nersc.gov stargrid01.bnl.gov push raw data STAR VM #4 ... globus-url-copy globus-job-run cache HPSS cache 20 TB pool results STAR VM #3 3 TB gpfs STAR VM #2 a t a s d t l u w s a e r r DB fresh snapshot t t e STAR VM #1 u g p available every 2 hours 80 GB local scratch disk 8 cores, 20 GB RAM STAR software local DB asynchronous local DBs updated periodically 14 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Model of coordination of VMs Model citizen • acts autonomously • highly specialized • aggregated output from many individuals serves a higher purpose Principles of VM operation: 1.Acts w/o supervision 2.Protects own integrity 3.Initiates connection to host • acquire input • perform task pagoda nest-ants nest • retruns results to host • rest for ‘5 minutes’ 15 RHIC & Cloud, 2011 OSG All Hands Jan Balewski, MIT
Recommend
More recommend