Bayesian perspective on QCD global analysis In collaboration with: Nobuo Sato University of Connecticut/JLab A. Accardi DIS18, E. Nocera Kobe, Japan, W. Melnitchouk April 16-20, 2018 1 / 17
Bayesian methodology in a nutshell In QCD global analysis PDFs are parametrized at some scale Q 0 . e.g. f ( x ) = Nx a (1 − x ) b (1 + c √ x + dx + ... ) f ( x ) = Nx a (1 − x ) b NN( x ; { θ, w i } ) “fitting” is essentially estimation of � a = ( N, a, b, c, d, ... ) d n a P ( a | data ) f ( a ) E[ f ] = � d n a P ( a | data ) ( f ( a ) − E[ f ]) 2 V[ f ] = The probability density P is given by the Bayes’ theorem P ( f | data ) = 1 Z L ( data | f ) π ( f ) 2 / 17
Bayesian methodology in a nutshell The likelihood function is not unique. A standard choice is the Gaussian likelihood � 2 � � − 1 � d i − thy i ( a ) � L ( d | a ) = exp 2 δd i i Priors are design to veto unphysical regions in parameter space. e.g. � θ ( a i − a min ) θ ( a max π ( a ) = − a i ) i i i How do we compute E[ f ] , V[ f ] ? + Maximum likelihood + Monte Carlo methods 3 / 17
Maximum Likelihood Estimation of expectation value � d n a P ( a | data ) f ( a ) ≃ f ( a 0 ) E[ f ] = a 0 is estimated from optimization algorithm max [ P ( a | data )] = P ( a 0 | data ) max [ L ( data | a ) π ( a )] = L ( data | a 0 ) π ( a 0 ) or equivalently Chi-squared minimization min [ − 2 log ( L ( data | a ) π ( a ))] = − 2 log ( L ( data | a 0 ) π ( a 0 )) � 2 � d i − thy i ( a 0 ) � = − 2 log ( π ( a 0 )) δd i i = χ 2 ( a 0 ) − 2 log ( π ( a 0 )) 4 / 17
Maximum Likelihood Estimation of variance (Hessian method) � d n a P ( a | data ) ( f ( a ) − E[ f ]) 2 V[ f ] = � 2 � f ( t k = 1) − f ( t k = − 1) � ≃ 2 k It relies on factorization of P ( a | data ) along eigen directions − 1 � � 2 t 2 + O (∆ a 3 ) � P ( a | data ) ∝ exp k k and linear approximation of f ( a ) � 2 �� ∂f ( f ( a ) − E[ f ]) 2 = + O ( a 3 ) t k ∂t k k 5 / 17
Maximum Likelihood pros + Very practical. Most PDF groups use this method + It is computationally inexpensive + f and its eigen directions can be precalculated/tabulated cons + Assumes local Gaussian approximation of the likelihood + Assumes linear approximation of the observables O around a 0 + The assumptions are strictly valid for linear models. + Computation of the Hessian matrix is numerically unstable if flat directions are present examples → if f ( x ) = a + bx + cx 2 then E[ f ( x )] = E[ a ] + E[ b ] x + E[ c ] x 2 → but f ( x ) = Nx a (1 − x ) b then E[ f ( x )] � = E[ N ] x E[ a ] (1 − x ) E[ b ] 6 / 17
Monte Carlo Methods Recall that we are interested in computing � d n a P ( a | data ) f ( a ) E[ f ] = � d n a P ( a | data ) ( f ( a ) − E[ f ]) 2 V[ f ] = Any MC method attempts to do this using MC sampling � E[ f ] ≃ w k f ( a k ) k � w k ( f ( a k ) − E[ f ]) 2 V[ f ] ≃ k i.e to construct the sample distribution { w k , a k } of the parent distribution P ( a | data ) 7 / 17
Monte Carlo Methods Resampling + cross validation Nested Sampling (NS) Hybrid Markov chain (HMC); Gabin Gbedo, Mangin-Brinet (2017) x ∆ u + x ∆ d + 0 . 4 0 h u zH ⊥ (1) 1 0 . 4 1 1(fav) 0 . 3 − 0 . 05 0 0 . 2 0 . 2 − 0 . 10 JAM17 0 – 1 0 . 1 JAM15 h d zH ⊥ (1) − 0 . 15 – 0 . 2 – 2 0 1 1(unf) 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 0 . 2 0 . 4 0 . 6 0 . 8 1 – 0 . 4 – 3 0 . 04 x (∆¯ u + ∆ ¯ 0 . 04 d ) x z 0 0 . 2 0 . 4 0 . 6 0 . 2 0 . 4 0 . 6 0 . 02 0 . 02 0 0 0 − 0 . 02 − 0 . 02 δd normalized yield SIDIS (a) (b) SIDIS+lattice 6 − 0 . 04 − 0 . 04 u − ∆ ¯ SIDIS DSSV09 x (∆¯ d ) – 0 . 4 10 − 3 10 − 2 10 − 1 0 . 4 0 . 8 10 − 3 10 − 2 10 − 1 0 . 4 0 . 8 4 0 . 04 x ∆ s + 0 . 1 x ∆ s − – 0 . 8 2 0 . 02 0 . 05 SIDIS+lattice 0 0 – 1 . 2 0 0 0 . 2 0 . 4 0 0 . 5 1 g T δu − 0 . 02 − 0 . 05 − 0 . 04 − 0 . 1 Nested Sampling, Lin et al (2018) JAM17 + SU(3) 0 . 8 x 0 . 8 x 10 − 3 10 − 2 10 − 1 10 − 3 10 − 2 10 − 1 0 . 4 0 . 4 resampling + CV, Ethier et al (2017) 8 / 17
Resampling+cross validation (R+CV) Resample the data points within fit quoted uncertainties using Gaussian sampler priors posteriors fit statistics fit d (pseudo) = d (exp) + σ (exp) R k,i i i k,i original data Fit each pseudo data sample k = 1 , .., N to obtain parameter pseudo prior data vectors a k : training validation data data P ( a | data ) → { w k = 1 /N, a k } as initial fit guess For large number of parameters, split parameters from validation the data into tranining and validation minimization steps sets and find a k that best describes the validation sample posterior 9 / 17
Nested Sampling (NS) - arXiv:astro-ph/0508461v2 - arXiv:astro-ph/0701867v2 L (data | a ) in a space - arxiv.org/abs/1703.09701 The basic idea : compute � 1 � L (data | a ) π ( a ) d n a = Z = L ( X ) dX 0 + The procedure collects samples from isolikelihoods and they are weighted by their likelihood values + Insensitive to local minima → faithful L ( X ) in X space conversion of P ( a | data ) → { w k , a k } + Multiple runs can be combined into one single run → the procedure can be parallelized 10 / 17
Comparison between the methods Given a likelihood, does the evaluation of E[ f ] and V[ f ] depend on the method? → use stress testing numerical example Setup: + Simulate a synthetic data via rejection sampling + Estimate E[ f ] and V[ f ] using different methods 3 3 2 2 f ( x ) f ( x ) 1 1 0 0 10 − 2 10 − 1 10 0 10 − 2 10 − 1 10 0 x x 11 / 17
Comparison between the methods 0 . 05 NS 3 0 . 04 δf ( x ) f ( x ) 0 . 03 2 HESS 0 . 02 NS 1 0 . 01 R RCV(50 / 50) 0 0 . 00 10 − 3 10 − 2 10 − 1 10 0 0 . 00 0 . 25 0 . 50 0 . 75 1 . 00 x x HESS, NS and R provide the same uncertainty 1 . 4 ( δf/f ) / ( δf/f ) NS R+CV over estimates the 1 . 2 uncertainty by roughly a factor of 2 1 . 0 Uncertainties also depends on x = 0 . 1 0 . 8 x = 0 . 3 training fraction (tf) x = 0 . 5 0 . 6 x = 0 . 7 The results confirmed also within a 40 50 60 70 80 neural net parametrization tf 12 / 17
Beyond gaussian likelihood The Gaussian likelihoods are not adequate to describe uncertainties in the presence of incompatible data sets Example: + Two measurements of a quantity m : ( m 1 , δm 1 ) , ( m 2 , δm 2 ) + The expectation value and variance can be computed exactly E[ m ] = m 1 δm 2 + m 2 δm 1 δm 2 2 + δm 2 1 δm 2 2 δm 2 1 V[ m ] = δm 2 2 + δm 2 1 + note : V[ m ] is independent of | m 1 − m 2 | To obtain more realistic uncertainties, the likelihood function needs to be modified. (e.g. Tolerance criterion) 13 / 17
Likelihood profile in CJ15 10000 0 . 40 24 parameters, 1 1 0 . 35 8000 0 . 30 33 data sets Likelihood 6000 0 . 25 ∆ χ 2 0 . 20 4000 12 0 . 15 5 8 29 0 . 10 17 3 27 18 2000 2 11 Eigen direction 12 0 . 05 34 28 13 5 8 10 30 29 7 31 19 17 3 27 18 33 15 24 6 2 23 25 20 26 4 32 14 21 11 0 . 00 22 9 16 0 28 7 34 10 30 13 26 25 24 22 21 20 19 23 6 16 15 14 9 4 33 32 31 − 100 − 50 0 50 100 − 100 − 50 0 50 100 without 100 0 . 40 incompatibilities 1 0 . 35 80 0 . 30 Likelihood 60 0 . 25 ∆ χ 2 0 . 20 40 12 0 . 15 5 8 29 0 . 10 27 3 17 18 2 20 0 . 05 34 28 30 31 0 . 00 0 − 10 − 5 0 5 10 − 10 − 5 0 5 10 0 . 6 (0) a1uv (12) a2du (0) TOTAL (17) e866pp06xf 8 (1) a2uv (13) a4du (1) HerF2pCut (18) H2 CC em 0 . 4 (2) a4uv (14) a1g (2) slac p (19) d0run2cone (3) d0Lasy13 (20) d0 gamjet1 (3) a1dv (15) a2g 0 . 2 (4) e866pd06xf (21) CDFrun2jet 18 (4) a2dv (16) a3g 1 4 9 15 20 (5) BNS F2nd (22) d0 gamjet3 (5) a3dv (17) a4g Projection 0 . 0 (6) NmcRatCor (23) d0 gamjet2 12 13 19 2 5 6 10 11 14 16 17 21 22 23 (6) a4dv (18) a6dv 3 (7) slac d (24) d0 gamjet4 − 0 . 2 (7) a0ud (19) off1 (8) D0 Z (25) jl00106F2d 0 (8) a1ud (20) off2 (9) H2 NC ep 3 (26) HerF2dCut − 0 . 4 (9) a2ud (21) ht1 (10) H2 NC ep 2 (27) BcdF2dCor (10) a4ud (22) ht2 (11) H2 NC ep 1 (28) CDF Z − 0 . 6 (11) a1du (23) ht3 (12) H2 NC ep 4 (29) D0 Wasy (13) CDF Wasy (30) H2 NC em (14) H2 CC ep (31) jl00106F2p − 0 . 8 (15) cdfLasy05 (32) d0Lasy e15 7 (16) NmcF2pCor (33) BcdF2pCor − 1 . 0 14 / 17
Recommend
More recommend