the icecube data pipeline from the south pole to
play

The IceCube data pipeline: from the South Pole to publication - PowerPoint PPT Presentation

The IceCube data pipeline: from the South Pole to publication Jakob van Santen jakob.van.santen@desy.de PyData Berlin, 2016-05-21 2 Deutsches Elektronen-Synchrotron (DESY) Zeuthen Helmholtz research institute with ~200 scientists, postdocs,


  1. The IceCube data pipeline: from the South Pole to publication Jakob van Santen jakob.van.santen@desy.de PyData Berlin, 2016-05-21

  2. 2 Deutsches Elektronen-Synchrotron (DESY) Zeuthen Helmholtz research institute with ~200 scientists, postdocs, and students studying Kosmos high-energy astrophysics with gamma rays and neutrinos Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  3. Why look for them at the South Pole? What are we trying to learn? IceCube South Pole Neutrino Observatory How does IceCube find neutrinos? What’s a neutrino?

  4. 5 What’s a neutrino? Charged (electromagnetic interactions) Neutral (weak interactions only) 2.5e6 times less massive Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  5. 6 Sources of neutrinos Image: Wikipedia Image: chemistryviews.com Image: N. Svoboda Image: CERN Radioactive Nuclear Man-made Cosmic The Sun decay reactors particle accelerators accelerators ~10 6 eV ~10 9 eV ~10 15 eV Higher energy Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  6. Why look for them at the South Pole? What are we trying to learn? IceCube South Pole Neutrino Observatory How does IceCube find neutrinos? What’s a neutrino?

  7. 8 Cosmic rays Something accelerates nuclei to …but we don’t know what, or macroscopic energies… where! Knee Tibet-III 4 10 2nd Knee Amenomori et al., ICRC 2011 Grigorov ] JACEE 5 TeV -1 sr MGU 3 -1 10 Tien-Shan s -2 Ankle Tibet07 m Akeno 1.6 CASA-MIA [GeV HEGRA 2 10 Fly’s Eye F(E) Kascade Kascade Grande 2.6 1 Joule IceTop-73 E 10 HiRes 1 HiRes 2 Telescope Array Auger PRD 86 : 010001 (2013) 20 TeV 1 13 15 16 17 18 19 20 14 10 10 10 10 10 10 10 10 IceCube-59 E [eV] Abbasi et al., ApJ, 746 , 33, 2012 Neutrinos can point back to the cosmic accelerators! Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  8. Why look for them at the South Pole? What are we trying to learn? IceCube South Pole Neutrino Observatory How does IceCube find neutrinos? What’s a neutrino?

  9. South Pole Station: 90 deg South, 2835 m above sea level Image: NASA ~2800 m of pure, clear ice Image: USAF

  10. 11 South Pole Station IceCube Lab Main station Photo: Haley Buffman/NSF Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  11. 13 IceCube: a cubic-kilometer neutrino telescope buried in ice IceCube Lab (data center) Digital Optical Module (single-pixel camera) Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  12. Why look for them at the South Pole? What are we trying to learn? IceCube South Pole Neutrino Observatory How does IceCube find neutrinos? What’s a neutrino?

  13. 15 IceCube data pipeline South Pole offline (real time) Challenges: ‣ Getting data out of the South Pole Simulation ‣ Generating simulated data ‣ Allowing non-expert users to configure & extend data pipeline for Data Feature calculation many distinct science topics acquisition & event selection ‣ Distributing data to analyzers Analysis Science! Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  14. 16 A neutrino event in IceCube Color ⇔ time Size ⇔ light intensity Neutrino Interaction Muon Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  15. 17 Raw data ‣ 1 neutrino for every 1 million penetrating muons ‣ ~10 high-energy neutrino events per year ‣ Need features to select them! 10 milliseconds of raw data Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  16. Feature calculation & data selection

  17. 19 IceTray: IceCube’s processing framework ‣ Core written in ~20k lines of C++ ‣ User interface exposed via boost::python ‣ Two main components: • I3Frame : container for event data • I3Module : manipulates I3Frames ‣ Data storage in files Images: boost.org Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  18. 20 I3Frames I3Frame: dictionary of [immutable] C++ objects related to a single event In [ 1 ]: from icecube import icetray, dataio, dataclasses In [ 2 ]: f=dataio.I3File('hese.i3.bz2') In [ 3 ]: print f.pop_frame(icetray.I3Frame.DAQ) [ I3Frame (DAQ): 'CalibrationErrata' [DAQ] ==> I3Vector<OMKey> (137) 'FilterMask' [DAQ] ==> I3Map<string, I3FilterResult> (749) 'I3Geometry' [Geometry] ==> I3Geometry (401222) 'I3TriggerHierarchy' [DAQ] ==> I3Tree<I3Trigger> (616) 'OfflinePulses' [DAQ] ==> I3Map<OMKey, vector<I3RecoPulse> > (52917) 'PoleCascadeLinefit' [DAQ] ==> I3Particle (150) 'PoleMuonLlhFit' [DAQ] ==> I3Particle (150) 'PoleMuonLlhFitFitParams' [DAQ] ==> I3LogLikelihoodFitParams (68) 'PoleToIParams' [DAQ] ==> I3TensorOfInertiaFitParams (78) Flexible! Schema can change from event to event ] “I3” file is a stream of serialized I3Frames boost::serialization provides load/save, object versioning, etc. Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  19. 21 I3Modules ‣ I3Module : single-purpose processing stage ‣ User (physicist) configures module chain in Python I3Module ‣ An I3Module can: Frame • Add new objects to the frame I3Module • Remove objects from the frame • Drop the frame Frame tray = I3Tray() I3Module tray.Add("I3Reader", filenamelist="foo.i3") Frame tray.Add('HomogenizedQTot', Output='HomogenizedQTot', Pulses='OfflinePulsesHLC') I3Module tray.Add("I3Writer", filename="bar.i3") tray.Execute() Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  20. 22 User-defined I3Modules class Counter(icetray.I3ConditionalModule): def __init__(self, context): super(Counter,self).__init__(context) self.AddParameter("Key", "Name of counter to put in the frame", "Count") def Configure(self): self.key = self.GetParameter("Key") self.counter = 0 def Physics(self, frame): frame[self.key] = icetray.I3Int(self.counter) self.counter += 1 self.PushFrame(frame) tray.Add(Counter, Name="CountCount") Prototype rapidly in Python, rewrite in C++ as needed Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  21. 23 Filtering at the South Pole IceCube Lab Satellite relay IceCube Data Warehouse 300 events/s (Madison, WI) 100 GB/day 4 PB and counting 3000 events/s 1 TB/day Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  22. 24 Aside: grid computing ‣ Simulation requires tens of millions of CPU and GPU hours ‣ Opportunistic computing on academic grids in US and Europe with HTCondor glide-ins, custom Python middleware • Some Linux flavor (usually Red hat variant) • Software provisioned on CVMFS (HTTP-based read- only filesystem) Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  23. 25 Data formats for analysis I3Frame Event data ‣ I3Frame: flexible, but inefficient for partial reads ‣ Analysis development means Specific I3ParticleConverter I3FilterMaskConverter coercion for reading the same data over and each object over again → tabular formats Abstract table I3TableRow I3TableRow row ‣ tableio : framework for turning irregular event data into table rows Format-specific HDF5 ROOT backend pytables, pandas, h5py, etc. Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  24. Analysis

  25. 27 Histogramming Most IceCube analyses use binned 10 9 data Pre-selection 10 8 10 7 Pro Penetrating Events per year 10 6 ‣ Predicted mean in each bin is muons 10 5 straightforward to calculate with 10 4 Monte Carlo 10 3 ‣ Statistics are easy to understand 10 2 Atmos. neutrinos 10 1 Con 10 0 ‣ Have to choose how to bin 10 1 10 2 10 3 10 4 Number of collected photons Blindness Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  26. 28 dashi: histograms that do more numpy.histogramdd()-backed histogram objects with built-in ‣ summary statistics ‣ manipulation methods: add, multiply, slice, project, etc. ‣ storage in hdf5 datasets # create & fill 3d histogram h = dashi.histogram.histogram(3, (linspace(0, 1, 101),)*3) h.fill(get_3d_data()) # project out dimension 1 h.project([0,2]) # plot a 1-d slice https://github.com/emiddell/dashi h[1,1,:].line(differential=True) # store for later with tables.open_file('foo.hdf5', 'a') as hdf: dashi.histsave(h, hdf, '/', 'my_histogram') Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  27. 29 Example: discovering astrophysical neutrinos Simple event selection based on 2 features: ‣ > 6000 photon hits ‣ hit pattern starts inside detector volume 28 events survived in 2 years of Veto data μ Veto ν μ ✓ μ ✘ Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  28. 30 Analysis Q: What is the chance that the data is a fluctuation of the background? Energy Zenith angle 80 Showers Tracks 60 IceCube Preliminary Declination (degrees) 40 20 0 -20 -40 -60 -80 10 2 10 3 Deposited EM-Equivalent Energy in Detector (TeV) Bin data in observable space, compare counts to predicted mean in each bin A: < 5e-7 (discovery!) → doi:10.1126/science.1242856 Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

Recommend


More recommend