Advances in using GPs with derivative observations Gaussian Process - PowerPoint PPT Presentation

Advances in using GPs with derivative observations Gaussian Process approximations 2017 –workshop by Eero Siivola 1 , joint work with Aki Vehtari 1 , Juho Piironen 1 , Javier González 2 , Jarno Vanhatalo 3 and Olli-Pekka Koistinen 1 1 Aalto University, Finland 2 Amazon, Cambridge, UK 3 Univeristy of Helsinki, Finland

Contents of this talk I Theory behind GPs + derivatives I GP-NEB I Automatic monotonicity detection with GPs I Bayesian optimization with derivative sign information Advances in using GPs with derivative observations May 30, 2017 2/43

Theory: GP + derivative observations How to use (partial) derivatives with GPs? We need to consider two parts: I Covariance function I Likelihood function I Posterior -> Inference method Advances in using GPs with derivative observations May 30, 2017 3/43

Covariance function Nice property (See e.g. Papoulis [1991, ch. 10]): ! ∂ f ( 1 ) ∂ ⇣ f ( 1 ) , f ( 2 ) ⌘ ∂ ⇣ x ( 1 ) , x ( 2 ) ⌘ , f ( 2 ) cov = cov = k ∂ x ( 1 ) ∂ x ( 1 ) ∂ x ( 1 ) g g g and: ! ∂ f ( 1 ) , ∂ f ( 2 ) � f ( 1 ) , f ( 2 ) � ∂ 2 cov = cov ∂ x ( 1 ) g ∂ x ( 2 ) ∂ x ( 1 ) ∂ x ( 2 ) h g h � x ( 1 ) , x ( 2 ) � ∂ 2 = k ∂ x ( 1 ) g ∂ x ( 2 ) h Advances in using GPs with derivative observations May 30, 2017 4/43

x ( 1 ) , . . . , x ( n ) ⇤ T and ˜ ⇥ ⇥ ˜ x ( m ) ⇤ T , be points x ( 1 ) , . . . , ˜ Let X = X = where we observe function values and partial derivative values. The covariance between latent function values f ( 1 ) , . . . , f ( n ) ⇤ T and latent function derivative values ⇥ f X =  � T ∂ ˜ g , . . . , ∂ ˜ ˜ f ( 1 ) f ( m ) f 0 X = is: ˜ x ( 1 ) x ( m ) ∂ ˜ ∂ ˜ g 2 3 g cov ( f ( 1 ) , ˜ g cov ( f ( 1 ) , ˜ ∂ f ( 1 ) ) ∂ f ( m ) ) · · · x ( 1 ) x ( m ) ∂ ˜ ∂ ˜ 6 7 . . ... 5 = K T 6 . . 7 K X , ˜ X = . . 6 7 ˜ X , X 4 g cov ( f ( n ) , ˜ g cov ( f ( n ) , ˜ ∂ f ( 1 ) ) ∂ f ( m ) ) · · · x ( 1 ) x ( m ) ∂ ˜ ∂ ˜ Advances in using GPs with derivative observations May 30, 2017 5/43

And between latent function derivative values ˜ X and ˜ f ˜ f ˜ X 2 3 ∂ 2 g cov (˜ f ( 1 ) , ˜ ∂ 2 g cov (˜ f ( 1 ) , ˜ f ( 1 ) ) f ( m ) ) · · · x ( 1 ) x ( 1 ) x ( 1 ) x ( m ) ∂ ˜ g ∂ ˜ ∂ ˜ g ∂ ˜ 6 7 . . ... 6 . . 7 K ˜ X = . . X , ˜ 6 7 4 5 ∂ 2 g cov (˜ f ( m ) , ˜ ∂ 2 g cov (˜ f ( m ) , ˜ f ( 1 ) ) f ( m ) ) · · · x ( m ) x ( 1 ) x ( m ) x ( m ) ∂ ˜ ∂ ˜ ∂ ˜ ∂ ˜ g g Advances in using GPs with derivative observations May 30, 2017 6/43

Likelihood function Observations are assumed independent given latent function values: � n ! m !! ∂ ˜ y ( i ) f ( i ) ∂ ˜ � Y Y y 0 | f X , ˜ p ( y ( i ) | f ( i ) ) p ( y , ˜ f 0 � X ) = p ˜ � ∂ x ( i ) ∂ x ( i ) � g g i = 1 i = 1 How to select the likelihood of derivatives? I If direct derivative values can be observed: Gaussian likelihood I If we only have hint about the direction: Probit likelihood with a tuning parameter (Riihimäki and Vehtari (2010)) 0 1 � ! ! a ∂ ˜ ∂ ˜ Z y ( i ) f ( i ) f ( i ) ∂ ˜ � 1 � p = Φ , where @ φ ( a )= N ( x | 0 , 1 ) d x A � ∂ x ( i ) ∂ x ( i ) ∂ x ( i ) ν � g g g �1 Advances in using GPs with derivative observations May 30, 2017 7/43

Probit likelihood with ν = 1 × 10 − 4 Probit likelihood with ν = 1 1 1 y y 0 0 − 3 − 2 − 1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 x x Advances in using GPs with derivative observations May 30, 2017 8/43

Posterior distribution Posterior distribution of joint values: ✓ n ◆ ✓ m ✓ � ◆◆ � Q Q ∂ ˜ y ( i ) ∂ ˜ p ( f , ˜ f 0 | X , ˜ f ( i ) p ( y ( i ) | f ( i ) ) X ) p � ∂ x ( i ) ∂ x ( i ) � p ( f , ˜ i = 1 i = 1 g g y 0 , X , ˜ f 0 | y , ˜ X ) = Z Different parts: I p ( f , ˜ f 0 | X , ˜ X ) is Gaussian I p ( y ( i ) | f ( i ) ) are Gaussian � ✓ ◆ y ( i ) � ∂ ˜ ∂ ˜ f ( i ) I p Gaussian/probit � ∂ x ( i ) ∂ x ( i ) � g g The posterior distribution is either Gaussian or similar as in classification problems I We might need posterior approximation methods Advances in using GPs with derivative observations May 30, 2017 9/43

Saddle point search using GPs + derivative observations I The properties of the system can be described by an energy surface I Finding a minimum energy path and the saddle point between two states is useful when determining properties of transitions Advances in using GPs with derivative observations May 30, 2017 10/43

Nudged elastic band (NEB) I Starting from an initial guess, the idea is to move the images downwards on the energy surface but keep them evenly spaced I The images are moved along a force vector, which is a resultant of two components: I (Negative) energy gradient component perpendicular to the path I A spring force parallel to the path, which tends to keep the images evenly spaced Advances in using GPs with derivative observations May 30, 2017 11/43

I The convergence of NEB may require hundreds or thousands of iterations I Each iteration requires evaluation of the energy gradient for all images, which is often a time-consuming operation Advances in using GPs with derivative observations May 30, 2017 12/43

Speedup of NEB I Repeat until convergence: 1. Evaluate the energy (and forces) at the images of the current path 2. If path not converged, approximate the energy surface using machine learning based on the observations so far 3. Find the predicted minimum energy path on the approximate surface and go to 1 I The details in paper by Peterson (2016) Advances in using GPs with derivative observations May 30, 2017 13/43

Speedup of NEB with GP and derivatives I Evaluate the energy (and forces) only at the image with the highest uncertainty I Re-approximate the energy surface and find a new MEP guess after each image evaluation I Convergence check: I If the magnitude of the force (may be accurate or approximation) is below the convergence limit for all images, we don’t move the path, but evaluate more images, until the convergence limit is not met any more or all images have been evaluated I If we manage to evaluate all images without moving the path, we know for sure if the path is converged I The details in paper by Koistinen, Maras, Vehtari and Jónsson (2016): Advances in using GPs with derivative observations May 30, 2017 14/43

Advances in using GPs with derivative observations May 30, 2017 15/43

I When evaluating the transition rates, the Hessian of the minimum points needs to be evaluated at some phase I This information can be used to improve the GP approximations, especially in the beginning, when there is little information Advances in using GPs with derivative observations May 30, 2017 18/43

Comparison of methods in heptamer case study Advances in using GPs with derivative observations May 30, 2017 19/43

Automatic monotonicity detection I Derivative sign information can be used to find monotonic input output directions I The basic idea: I Add derivative sign observations to the GP model I See if the additions affect to the probability of the data I the dimension is monotonic if not I The details in paper by Siivola, Piironen and Vehtari (2016) Advances in using GPs with derivative observations May 30, 2017 20/43

Theoretical background Energy comparison: y 0 | X , ˜ y 0 | X , ˜ E ( y , ˜ X m ) = − log p ( y , ˜ X m ) 0 1 ⇡ 1 z }| { y 0 | y , X , ˜ B C p (˜ = − log @ p ( y | X ) X m ) A ≈ E ( y | X ) . Advances in using GPs with derivative observations May 30, 2017 21/43

Energy of data E 0 GP with monotonicity assumption E regular GP < N Number of virtual observations Figure: Change in energy in reality as a function of virtual derivative sign observations Advances in using GPs with derivative observations May 30, 2017 22/43

Using automatic monotonicity detection in modelling I Monotonic dimensions can be detected from the data and used in modelling I The method makes the modelling results especially on the borders. Advances in using GPs with derivative observations May 30, 2017 23/43

Experiment I Six different functions of varying monotonicity I Different amount of noise added to training samples (signal to noise ratio (SNR) between 0 and 1) I Measure the log predictive posterior density of samples from a hold out set that resemble 20 % of the bordermost samples in the training data: L Z X lppd = log p ( y i | f ) p post ( f | x i ) df i = 1 I Do this for three different models for 200 times: I Use fixed monotonicity I Use monotonicity if the it does not change the energy (adaptive monotonicity) I Use model without derivative observations Advances in using GPs with derivative observations May 30, 2017 24/43

Advances in using GPs with derivative observations Gaussian Process - PowerPoint PPT Presentation

Advances in using GPs with derivative observations Gaussian Process approximations 2017 workshop by Eero Siivola 1 , joint work with Aki Vehtari 1 , Juho Piironen 1 , Javier Gonzlez 2 , Jarno Vanhatalo 3 and Olli-Pekka Koistinen 1 1 Aalto

Geometric Interpretation of the Derivative (Review) Geometric Interpretation of the Derivative

Localization with GPS Localization with GPS From GPS Theory and Practice Fifth Edition

Faster GPS via the Sparse Fourier Transform Haitham Hassanieh Fadel Adib Dina Katabi Piotr Indyk

Targeted GPS spoofing Bart Hermans & Luc Gommans University of Amsterdam - RP2 How does GPS

6. Kinematic GPS and Applications Tectonic Geodesy GEOS 655 Kinematic GPS Development of

Die Mathematik des GPS Drei Segmente des GPS Koordinatensysteme Notwendige Lineare Algebra

Imp mprovin ing S Sensitivity o on K Kea CubeS eSat GPS GPS Rec ecei eiver ers Eamonn

2. Theory of the Derivative 2.1 Tangent Lines 2.2 Definition of Derivative 2.3 Rates of Change

Derivative Function Math 132 Stewart 2.2 In Notes 2.1, we defined the derivative of a

Applications of GPS Provided Time and Frequency and Future Edward Powers United States Naval

Capturing data and use of GPS Capturing data GIS GPS Paper maps Coordinates Satellite images

GPS as a dark matter detector Andrei Derevianko University of Nevada, Reno, USA GPS.DM (?)

Lecture 22 Computational Methods for GPs Colin Rundel 04/12/2017 1 GPs and Computational

Securities & Securities & Derivative Derivative Litigation Report Litigation Report

Securities & Securities & Derivative Derivative Litigation Repor t t Litigation Repor

Securities & Securities & Derivative Derivative Litigation Report Litigation Report

THE WILD (MID)WEST Dennis Mistrioty CEO/Owner of Wisconsin Hemp Scientific ROADMAP 1. CBD

Corporate Presentation January 2016 Forward-Looking Statements This presentation has been

INTERIM RESULTS HALF YEAR ENDED 31 MARCH 2020 CEO: MARK WEBSTER / CFO: CHRIS JEWELL KEY

GenderQuant: Quantifying Mention-Level Genderedness Ananya Nitya Parthasarthi Sameer Singh 1

How High? Insurance and Medical Marijuana in 2017 John Leinicke Scharome Wolfe Maribel Lopez

SBI Holdings, Inc. 2012 Information Meeting November 27 Tokyo December 3 Osaka December 7

Creating the Global Leader in Rare Diseases Flemming rnskov, MD, MPH Chief Executive Officer

Recognising risk factors in patients risk assessment and day surgery Gill Lowe University

Advances in using GPs with derivative observations Gaussian Process - PowerPoint PPT Presentation

Advances in using GPs with derivative observations Gaussian Process approximations 2017 workshop by Eero Siivola 1 , joint work with Aki Vehtari 1 , Juho Piironen 1 , Javier Gonzlez 2 , Jarno Vanhatalo 3 and Olli-Pekka Koistinen 1 1 Aalto

Geometric Interpretation of the Derivative (Review) Geometric Interpretation of the Derivative

Localization with GPS Localization with GPS From GPS Theory and Practice Fifth Edition

Faster GPS via the Sparse Fourier Transform Haitham Hassanieh Fadel Adib Dina Katabi Piotr Indyk

Targeted GPS spoofing Bart Hermans &amp; Luc Gommans University of Amsterdam - RP2 How does GPS

6. Kinematic GPS and Applications Tectonic Geodesy GEOS 655 Kinematic GPS Development of

Die Mathematik des GPS Drei Segmente des GPS Koordinatensysteme Notwendige Lineare Algebra

Imp mprovin ing S Sensitivity o on K Kea CubeS eSat GPS GPS Rec ecei eiver ers Eamonn

2. Theory of the Derivative 2.1 Tangent Lines 2.2 Definition of Derivative 2.3 Rates of Change

Derivative Function Math 132 Stewart 2.2 In Notes 2.1, we defined the derivative of a

Applications of GPS Provided Time and Frequency and Future Edward Powers United States Naval

Capturing data and use of GPS Capturing data GIS GPS Paper maps Coordinates Satellite images

GPS as a dark matter detector Andrei Derevianko University of Nevada, Reno, USA GPS.DM (?)

Lecture 22 Computational Methods for GPs Colin Rundel 04/12/2017 1 GPs and Computational

Securities &amp; Securities &amp; Derivative Derivative Litigation Report Litigation Report

Securities &amp; Securities &amp; Derivative Derivative Litigation Repor t t Litigation Repor

Securities &amp; Securities &amp; Derivative Derivative Litigation Report Litigation Report

THE WILD (MID)WEST Dennis Mistrioty CEO/Owner of Wisconsin Hemp Scientific ROADMAP 1. CBD

Corporate Presentation January 2016 Forward-Looking Statements This presentation has been

INTERIM RESULTS HALF YEAR ENDED 31 MARCH 2020 CEO: MARK WEBSTER / CFO: CHRIS JEWELL KEY

GenderQuant: Quantifying Mention-Level Genderedness Ananya Nitya Parthasarthi Sameer Singh 1

How High? Insurance and Medical Marijuana in 2017 John Leinicke Scharome Wolfe Maribel Lopez

SBI Holdings, Inc. 2012 Information Meeting November 27 Tokyo December 3 Osaka December 7

Creating the Global Leader in Rare Diseases Flemming rnskov, MD, MPH Chief Executive Officer

Recognising risk factors in patients risk assessment and day surgery Gill Lowe University

Targeted GPS spoofing Bart Hermans & Luc Gommans University of Amsterdam - RP2 How does GPS

Securities & Securities & Derivative Derivative Litigation Report Litigation Report

Securities & Securities & Derivative Derivative Litigation Repor t t Litigation Repor

Securities & Securities & Derivative Derivative Litigation Report Litigation Report