Physics, Big Data Analysis and Philosophy (and all the rest) Wolfgang Rhode
... eritis sicut Deus ...
Plato‘s Cave Analogy
Are true conclusions possible? ´ Within the world of logic ? Yes ´ Within the world of mathematics ? Yes ´ Within the world of observations ? No ´ Accuracy of observations ´ Conclusion from the observed effect to it’s cause ´ Within the world of teleology ? No ´ To reach a goal ? The program running on us -? > Big Data Analysis
Consequence ´ Ancient and middle age philosophy: ´ Rationally invented systems based on logic and mathematics ´ However: these systems are inappropriate to describe the nature. ´ Galileo Galilei: ´ Introduction of the experiment as mean to understand the nature ´ Book of nature is written in the language of mathematics
How to justify conclusions from experiments to the world of theories? ´ Success of classical physics (Newton … Einstein) ´ Btw:. Classical physics: deterministic, no probability elements ´ Aggravating: Success of Thermodynamics and Quantum Mechanics (etc.) ´ In very different ways probability based. ´ Epistemology: Different tries to answer the question, but no success … until ...
Karl Popper: Logic of scientific discovery (1934) ´ Theories are (somehow) more or less rationally invented ´ Scientific theories allow an experimental test ´ Disagreement: rejection & search for a better theory ´ Agreement: acceptance for the moment & search for a more decisive experiment ´ Critical Rationalism ´ Logic based theory ´ Is it necessary to reject a theory, just because of one non-fitting experiment?
Plato‘s Cave Revisited
Plato‘s Cave Revisited
Plato‘s Cave Revisited
Plato‘s Cave Revisited
The physical Problem: “Cave equation” g = measured numbers b = background A = Kernel = transfer function = detector properties f = wanted function
Computer Science View Monte Carlo Trigger Simulation Streams Experiment Scientific Sensors, Question Data Acquisition Signal Identification Data Reduction, Data Storage Data Pre- Processing Regulation, Improvements Analysis Inverse Problems Evaluation Concept Shift Complex Programming Models Evaluation 13
Monte Carlo: Virtual Reality ´ Physics … all that we know ... ´ Energy Spectra ´ Directional Distributions ´ Sky Maps ´ Data: … all we can measure ... ´ Charges, ´ Times, ´ Locations 14
Data Analysis: Monte Carlo Description (Goal: Measurement of the neutrino energy spectrum) ´ Monte Carlo Simulation of ´ Signal: Neutrinos ´ close to correct spatial and energy distribution ´ Neutrino interactions leading to charged particles (i.e. muons) ´ Muon interactions (path, range, deposited energy...) ´ Cherenkov-Light (Production by charged particles, propagation in ice) ´ Light detection and measurement (photomultiplier, read out electronics) ´ Physical Background: Cosmic Ray interaction in the Atmosphere ´ à (....) recorded light in the detector, correct simulation ´ Technical Background: correct simulation ´ Radioactivity of the ice or the detector itself à light ´ Photomultiplier noise (“signal without external reason“) ´ Artefacts of the readout electronics à fake signal
IceCube - Simulations �hC hu pR m u O � hN R m C � N u m N pcpR m u § e S�hC § �n a t � �� �������� �hR hgR m u r Em R m C � N u m N pcpR m u § �T hgR u m C lgO § N Em R m C lgO § ou lcchu § r r � § � � � § gT Ol . Atmospheric Muons (Corsika) - 1 Year Run Time of the Detector - One Configuration
Computer Science View Monte Carlo Trigger Simulation Streams Experiment Scientific Sensors, Question Data Acquisition Signal Identification Data Data Reduction, Storage Data Pre- Processing Regulation, Improvements Analysis Inverse Problems Evaluation Concept Shift Complex Programming Models Evaluation 17
TRIGGER: Which Data might we want? ´ Keep as much from the signal as possible . ´ Discard as much of the background as possible. ´ Decide within ~1 ms ´ à Write data to disk, if ´ N Detectors within a ´ Volume V and a ´ Time Interval T have seen a signal. ´ Hardware (FPGA) and Software realization possible.
First Analysis Steps: Which Events might we want? Still: Keep as much from the signal as possible . ´ Still: Discard as much of the background as possible. ´ Decide within ~ minutes - hours (Computing Farm) ´ Can the event be reconstructed at all? ´ Is the result physical? ´ (IceCube: Movement with velocity of light?, Upward?) Do different algorithms lead to compatible results? ´
The Problem: g = measured numbers b = background A = Kernel = transfer function = detector properties f = wanted function
Signal – Background Separation
How to obtain a clean signal data set?
µ µ µ- ν CC-Interaction Atmospheric µ Bad reconstructed µ Well reconstructed µ µ -Neutrino N S Track length > 200 m Zenith angle > 86 deg Track interruption < 400 m
§ �lcE� Pl. hC Ol m C pT � PpR p� Oh R � � § . pC H� u hPSC PpC R � � lu u hT hIp C R � �� C m lOH� Ip u lp T hO § GpH h� PpR p� pC P� Gm C R h� �pu T m � Pm � C m R � D lR � pR � pT T � lC � ghu R plC � Ip u lp T hO� § y GpC H� Pl. hC Ol m C Oy � � gm . N T lgpR h� pC P� . pH h� N u hIh C R � R Eh� Oh N pu pR lm C Ø t hT hgR lm C � m D � p� O. pT T � C S. hu � m D � pN N u m N u lpR h� Ip u lp T hO
§ Lpu lp T h� t hT hgR lm C � � � �pR p� Gm C R h� �pu T m � �m . N pu lOm C � ; u h. m Ih � lT T � Ip u lp T hO� D u m . � R Eh� pC pT HOl O � � �pT gST pR lm C O � gm u u hgR � � � �N N u m A l. pR lm C O � n �� � � r u m ghPSu hO � pN N T lgp T h� R m � PpR p� �e �� Gm C R h� �pu T m �
§ Lpu lp T h� t hT hgR lm C � � � �pR p� Gm C R h� �pu T m � �m . N pu lOm C � ; u h. m Ih � lT T � Ip u lp T hO� D u m . � R Eh� pC pT HOl O � � a h. m I h� u hPSC PpC R � pC P� . hpC lC cT hO O � Lpu lp T hO
§ Lpu lp T h� t hT hgR lm C � � � �pR p� Gm C R h� �pu T m � �m . N pu lOm C � ; u h. m Ih � lT T � Ip u lp T hO� D u m . � R Eh� pC pT HOl O � � a h. m I h� u hPSC PpC R � pC P� . hpC lC cT hO O � Lpu lp T hO � � Ga Ga � �hpR Su h� t hT hgR lm C
MRMR: Minimum Redundancy Maximum Relevance Stability of the MRMR Selection : Jaccard Index: Ç A B = J È A B Kuncheva‘s Index: - 2 rn k = I C ( A , B ) - k ( n k ) = = | A | | B | k = Ç r | A B |
§ Lpu lp T h� t hT hgR lm C � � � �pR p� Gm C R h� �pu T m � �m . N pu lOm C � ; u h. m Ih � lT T � Ip u lp T hO� D u m . � R Eh� pC pT HOl O � � a h. m I h� u hPSC PpC R � pC P� . hpC lC cT hO O � Lpu lp T hO � � Ga Ga � �hpR Su h� t hT hgR lm C Dimensions M ∼ 2000 M ∼ 120 M = 30 3 1 & 2
§ a pC Pm . � �m u hOR � � m u � m R Ehu � PpR p� . lC lC c� . hR Em P� � § t hpu gE hP� lO� p� . m PhT � � u hgm cC ldlC c� OR u SgR Su hO� pN N u m N u lpR h� D m u � Ol cC pT � pgic u m SC P� Oh N pu pR lm C � D u m . � Gm C R h� �pu T m § � . N m u R pC R � � Gm C R h� �pu T m � gpu u lhO� p hT O� D m u � t lcC pT � pC P� �pgic u m SC P § �A p. N T h� Ehu h� � a pC Pm . � �m u hOR � § r u hPlgR lm C � � hO R l. pR lm C � hR KhhC � � � pC P� � � D m u � R Eh� ,O lcC pT C hO O y � m D � p� O lC cT h� hIh C R
Random Forest Attributes for the complete RF Attributes at knot Signal/Background -Relation at training
Signal Selection 32
Error Estimation: Cross Validation § ohO R � m D � ,t R p lT lR Hy � pC P� ,n I hu R u plC lC cy
��� � �S �P ��� � µ �� RP � � ��� µ �m . N T hR h � � � � � � � � � � � � a pC ch � I � � � � � ohL � � � � � � � � � � � � � I � � � � � � ohL � � � � � � � � � � � � 99.6±0.2 % Purity of Muon-Neutrino Events § Data 99.9999% Background Rejected §
Recommend
More recommend