likelihood free gravitational wave parameter estimation
play

Likelihood-free gravitational-wave parameter estimation with neural - PowerPoint PPT Presentation

Likelihood-free gravitational-wave parameter estimation with neural networks Stephen R. Green Albert Einstein Institute Potsdam based on arXiv:2002.07656 with C. Simpson and J. Gair Gravity Seminar University of Southampton February 27, 2020 1


  1. Likelihood-free gravitational-wave parameter estimation with neural networks Stephen R. Green Albert Einstein Institute Potsdam based on arXiv:2002.07656 with C. Simpson and J. Gair Gravity Seminar University of Southampton February 27, 2020 � 1

  2. Outline 1. Introduction to Bayesian inference for compact binaries 2. Likelihood-free inference with neural networks (a) Basic approach (b) Normalizing flows (c) Variational autoencoders 3. Results � 2

  3. 
 
 
 
 
 
 
 
 Introduction to parameter estimation • Bayesian inference for compact binaries: 
 Sample posterior distribution for system parameters � (masses, spins, sky θ position, etc.) given detector strain data � . 
 s likelihood prior p ( θ | s ) = p ( s | θ ) p ( θ ) p ( s ) evidence (normalizing factor) • Once likelihood and prior are defined, right hand side can be evaluated (up to normalization). � 3

  4. 
 
 ̂ 
 
 
 
 
 
 
 Introduction to parameter estimation • Likelihood based on assumption that if the gravitational-wave signal were subtracted from � , then what remains must be noise. s • Noise � assumed to follow stationary Gaussian distribution, i.e., 
 n n ∼ p ( n ) ∝ exp ( − 1 2 ( n | n ) ) where the noise-weighted inner product is 
 a ( f ) ̂ a ( f )* ̂ ∞ ( a | b ) = 2 ∫ b ( f )* + ̂ b ( f ) d f S n ( f ) 0 detector noise power spectral density (PSD) • Summed over detectors, this gives the likelihood, 
 p ( s | θ ) ∝ exp ( − 1 ( s I − h I ( θ ) | s I − h I ( θ ) ) ) 2 ∑ I � 4

  5. 
 
 Introduction to parameter estimation • Prior � based on beliefs about system p ( θ ) before looking at data, 
 e.g., uniform in � over some range, 
 m 1 , m 2 uniform in spatial volume, 
 etc. • With prior and likelihood defined, the posterior can be evaluated up to normalization. • Method such as Markov chain Monte Carlo (MCMC) is used to obtain posterior samples. 
 Move around parameter space, and compare strain data � against waveform model � . s h ( θ ) Image: Abbott et al (2016) � 5

  6. 
 Need for new methods • Standard method expensive: • Many likelihood evaluations required for each independent sample • Likelihood evaluation slow, requires a waveform to be generated • Various waveform models (EOBNR, Phenom, …) created as faster alternatives to numerical relativity; reduced-order surrogate models for even faster evaluation. • Days to months for parameter estimation of a single event, depending on type of event and waveform model. Goal of this work: 
 Develop deep learning methods to do parameter estimation much faster. Model the posterior distribution � with a neural network. p ( θ | s ) � 6

  7. Main result: very fast posterior sampling 5 5 0 m 2 / M � 5 5 4 0 4 5 3 0 6 . Rest of this talk: 
 5 4 . φ 0 0 3 . How did we do this? 5 1 . 0 0 . 5 6 8 6 0 . 0 5 t c / s 8 6 0 . 5 3 8 6 0 . 0 2 8 6 0 . 0 0 4 2 d L / Mpc 0 0 0 2 0 0 6 1 0 0 2 1 9 0 . 6 0 . χ 1 z 3 0 . 0 0 . 3 0 . − 0 1 . 5 0 . χ 2 z 0 0 . 5 0 . − 0 2 1 3 . . − 4 2 . θ JN 6 1 . 8 0 . 0 0 . 4 0 6 2 8 5 0 5 0 5 0 5 0 5 0 0 5 0 5 0 0 0 0 3 0 3 6 9 0 5 0 5 0 0 8 6 4 2 5 6 6 7 7 3 4 4 5 5 2 3 5 6 0 0 0 0 0 . 1 . 3 . 4 . 6 . 0 . 0 . 0 . 0 . 0 . 1 . 0 . 0 . 0 . 1 0 . . 0 . 1 . 2 . 3 . 8 8 8 8 2 6 0 4 6 6 6 6 1 1 2 2 − − − m 1 / M � m 2 / M � 0 0 0 0 φ 0 . . . . θ JN � 7 d L / Mpc χ 1 z χ 2 z t c / s

  8. Two key ideas 1. A conditional probability distribution can be described by a neural network. 
 2. The network can be trained to model a gravitational wave posterior distribution without ever evaluating a likelihood. Instead, it only requires samples � from the data generating process. ( θ , s ) � 8

  9. 
 <latexit sha1_base64="lLl4U0dj1wGiFMGs9y0nSvg/7rM=">ACOnicbVDBThsxEPVCgRAKBDhysRpRgYQiL0KCA0ioXDgSqQGkOIq8zuzGwutd2bMo0Srf1Uu/ghuHXjiAql7AXWSPbTQJ1l6fm9m7HlRrpVDxp6ChcUPS8srtdX62sf1jc3G1vaNyworoSMzndm7SDjQykAHFWq4y2INJwG91fTv3bB7BOZeYrjnPopSIxKlZSoJf6jTZ3KklFP9wfHdBzyjXEyEvKI0iUKYW1YjwprZ7Q0SH9TEc8AcZ5nc0uZ4xyMIOqinKrkiG2+o0ma7EZ6HsSVqRJKlz3G498kMkiBYNSC+e6Icux56eikhomdV4yIW8Fwl0PTUiBdcrZ6tP6J5XBjTOrD8G6Uz9u6MUqXPjNPKVqcChe+tNxf953QLj016pTF4gGDl/KC40xYxOc6QDZUGiHnsipFX+r1QOhRUSfdp1H0L4duX35OaoFbJW2D5uXnyp4qiRXfKJ7JOQnJALckWuSYdI8o38IC/kNfgePAc/g1/z0oWg6tkh/yD4/QdYeatY</latexit> <latexit sha1_base64="lLl4U0dj1wGiFMGs9y0nSvg/7rM=">ACOnicbVDBThsxEPVCgRAKBDhysRpRgYQiL0KCA0ioXDgSqQGkOIq8zuzGwutd2bMo0Srf1Uu/ghuHXjiAql7AXWSPbTQJ1l6fm9m7HlRrpVDxp6ChcUPS8srtdX62sf1jc3G1vaNyworoSMzndm7SDjQykAHFWq4y2INJwG91fTv3bB7BOZeYrjnPopSIxKlZSoJf6jTZ3KklFP9wfHdBzyjXEyEvKI0iUKYW1YjwprZ7Q0SH9TEc8AcZ5nc0uZ4xyMIOqinKrkiG2+o0ma7EZ6HsSVqRJKlz3G498kMkiBYNSC+e6Icux56eikhomdV4yIW8Fwl0PTUiBdcrZ6tP6J5XBjTOrD8G6Uz9u6MUqXPjNPKVqcChe+tNxf953QLj016pTF4gGDl/KC40xYxOc6QDZUGiHnsipFX+r1QOhRUSfdp1H0L4duX35OaoFbJW2D5uXnyp4qiRXfKJ7JOQnJALckWuSYdI8o38IC/kNfgePAc/g1/z0oWg6tkh/yD4/QdYeatY</latexit> <latexit sha1_base64="lLl4U0dj1wGiFMGs9y0nSvg/7rM=">ACOnicbVDBThsxEPVCgRAKBDhysRpRgYQiL0KCA0ioXDgSqQGkOIq8zuzGwutd2bMo0Srf1Uu/ghuHXjiAql7AXWSPbTQJ1l6fm9m7HlRrpVDxp6ChcUPS8srtdX62sf1jc3G1vaNyworoSMzndm7SDjQykAHFWq4y2INJwG91fTv3bB7BOZeYrjnPopSIxKlZSoJf6jTZ3KklFP9wfHdBzyjXEyEvKI0iUKYW1YjwprZ7Q0SH9TEc8AcZ5nc0uZ4xyMIOqinKrkiG2+o0ma7EZ6HsSVqRJKlz3G498kMkiBYNSC+e6Icux56eikhomdV4yIW8Fwl0PTUiBdcrZ6tP6J5XBjTOrD8G6Uz9u6MUqXPjNPKVqcChe+tNxf953QLj016pTF4gGDl/KC40xYxOc6QDZUGiHnsipFX+r1QOhRUSfdp1H0L4duX35OaoFbJW2D5uXnyp4qiRXfKJ7JOQnJALckWuSYdI8o38IC/kNfgePAc/g1/z0oWg6tkh/yD4/QdYeatY</latexit> <latexit sha1_base64="lLl4U0dj1wGiFMGs9y0nSvg/7rM=">ACOnicbVDBThsxEPVCgRAKBDhysRpRgYQiL0KCA0ioXDgSqQGkOIq8zuzGwutd2bMo0Srf1Uu/ghuHXjiAql7AXWSPbTQJ1l6fm9m7HlRrpVDxp6ChcUPS8srtdX62sf1jc3G1vaNyworoSMzndm7SDjQykAHFWq4y2INJwG91fTv3bB7BOZeYrjnPopSIxKlZSoJf6jTZ3KklFP9wfHdBzyjXEyEvKI0iUKYW1YjwprZ7Q0SH9TEc8AcZ5nc0uZ4xyMIOqinKrkiG2+o0ma7EZ6HsSVqRJKlz3G498kMkiBYNSC+e6Icux56eikhomdV4yIW8Fwl0PTUiBdcrZ6tP6J5XBjTOrD8G6Uz9u6MUqXPjNPKVqcChe+tNxf953QLj016pTF4gGDl/KC40xYxOc6QDZUGiHnsipFX+r1QOhRUSfdp1H0L4duX35OaoFbJW2D5uXnyp4qiRXfKJ7JOQnJALckWuSYdI8o38IC/kNfgePAc/g1/z0oWg6tkh/yD4/QdYeatY</latexit> 
 Introduction to neural networks • Nonlinear functions constructed as composition of mappings: Consists of: First hidden Input layer layer 1. Linear transformation 
 W 1 x + b 1 h 1 x 2. Simple element-wise σ 1 ( W 1 x + b 1 ) nonlinear mapping. 
 ⇢ x, x ≥ 0 E.g., x ∈ ℝ N h 1 ∈ ℝ N 1 σ 1 ( x ) = 0 , x < 0 � 9

  10. Introduction to neural networks First hidden Final hidden Second hidden Output layer Input layer layer layer layer … y h p x h 1 h 2 σ out ( W out h p + b out ) σ 1 ( W 1 x + b 1 ) σ 2 ( W 2 h 1 + b 2 ) y ∈ ℝ N out x ∈ ℝ N • Training/test data consist of (x, y) pairs. • Train network by tuning the weights W and biases b to minimize loss function � L ( y , y out ) • Stochastic gradient descent combined with chain rule (“backpropagation”) to adjust weights and biases. � 10

  11. <latexit sha1_base64="E86g1g0Oj0vjL8P28h4md/eBhr4=">ACpXicbVFdb9MwFHXC1yhfBR5sVYxEolWSTUEL5UmeNkLaIO1m1S3keM6rTvbCbaDGrn5Z/wK3vg3OF3Ex8aVLB2de6PfW5acKZNFP30/Fu379y9t3e/8+Dho8dPuk+fTXReKkLHJOe5ukixpxJOjbMcHpRKIpFyul5evmh6Z9/o0qzXJ6ZqAzgZeSZYxg46ik+70INtsqhAcjiAQ2K4K5/VQHSJRBFb6G6AtbCuxgGxCiFDnYIQyhYmNa4v0V2VsMEQFC+dyixbU/NFv6xrRTYE4zUwA+7+HhjXSpUgsW4/iei6DTcJg31mXCWtc2gvmth/Xjah2JHSidSta70QKbZcmTDp9qJBtCt4E8Qt6IG2TpLuD7TISmoNIRjradxVJiZxcowmndQaWmBSaXeEmnDkosqJ7ZXco1fOmYBcxy5Y40cMf+PWGx0LoSqVM2ServYb8X29amuzdzDJZlIZKcmWUlRyaHDYrgwumKDG8cgATxdxbIVlhF6hxi+24EOLrX74JsNBfDh4c3rYO3rfxrEHXoB9EIAYvAVH4BicgDEg3r537J16n/1X/kf/zJ9cSX2vnXkO/ik/+QUoycrp</latexit> 
 
 
 
 
 
 
 Neural networks as probability distributions • Since conditional probability distributions can be parametrized by functions, and neural networks are functions, conditional probability distributions can be described by neural networks. 
 E.g., multivariate normal distribution 
 p ( x | y ) = N ( µ ( y ) , Σ ( y ))( x ) 0 1 n 1 @ − 1 X ( x i − µ i ( y )) Σ − 1 = exp ij ( y )( x j − µ j ( y )) A p 2 (2 π ) n | det Σ ( y ) | ij =1 where � μ ( y ), Σ ( y ) = NN ( y ) . • For this example, it is trivial to draw samples and evaluate the density. • More complex distributions may also be described by neural networks (later in talk). � 11

  12. 
 
 
 
 
 
 
 
 Likelihood-free inference with neural networks [First applied to GW by Chua and Vallisneri (2020), Gabbard et al (2019)] • Goal is to train network to model true posterior, as given by prior and likelihood that we specify, i.e., 
 p ( θ | s ) → p true ( θ | s ) • Minimize expectation value (over � ) of cross-entropy between the distributions 
 s L = − ∫ d s p true ( s ) ∫ d θ p true ( θ | s ) log p ( θ | s ) Intractable with knowing posterior for each � ! s • Bayes’ theorem � ⟹ p true ( s ) p true ( θ | s ) = p true ( θ ) p true ( s | θ ) ∴ L = − ∫ d θ p true ( θ ) ∫ d s p true ( s | θ ) log p ( θ | s ) Only requires samples from likelihood, not the posterior! � 12

Recommend


More recommend