Pattern recognition in nuclear fusion data by means of geometric methods in probabilistic spaces Geert Verdoolaege Department of Applied Physics, Ghent University, Ghent, Belgium Laboratory for Plasma Physics, Royal Military Academy (LPP–ERM/KMS), Brussels, Belgium ECEA 2017, November 21 – December 1, 2017
Overview Stochastic uncertainty in fusion plasmas 1 Pattern recognition in probabilistic spaces 2 Geodesic least squares regression 3 Application in fusion science: edge-localized plasma instabilities 4 Application in astronomy: Tully-Fisher scaling 5 Conclusion 6
Overview Stochastic uncertainty in fusion plasmas 1 Pattern recognition in probabilistic spaces 2 Geodesic least squares regression 3 Application in fusion science: edge-localized plasma instabilities 4 Application in astronomy: Tully-Fisher scaling 5 Conclusion 6
Fusion energy ‘Star on earth’ Clean, safe, inexhaustible energy source Magnetic confinement fusion: tokamak, stellarator, . . . Confine hot hydrogen isotope plasma with magnetic fields ITER : next-generation international tokamak Complex physical system, turbulent transport Difficult to probe → uncertainty in measurements and models
Uncertainty in fusion plasmas Sources of statistical uncertainty: Fluctuation of system properties Measurement noise Edge-localized modes (MAST) Plasma turbulence (PPPL) Confinement time vs. density (JET)
Overview Stochastic uncertainty in fusion plasmas 1 Pattern recognition in probabilistic spaces 2 Geodesic least squares regression 3 Application in fusion science: edge-localized plasma instabilities 4 Application in astronomy: Tully-Fisher scaling 5 Conclusion 6
Difference/distance between points Patterns ↔ distances
Zooming in...
Mahalanobis distance
Information geometry Family of probability distributions → differentiable manifold Parameters = coordinates Metric tensor: Fisher information matrix Parametric probability model: p ( x | θ ) = ⇒ ∂ 2 � � g µν ( θ ) = − E ∂θ µ ∂θ ν ln p ( x | θ ) , µ , ν = 1, . . . , m θ = m -dimensional parameter vector Line element: d s 2 = g µν d θ µ d θ ν Minimum-length curve: geodesic Rao geodesic distance (GD)
Pattern recognition in probabilistic spaces Pattern recognition: Classification, clustering Regression analysis Dimensionality reduction, visualization Observation/prediction (structureless number) → distribution (structured object) More information, more flexibility
The univariate Gaussian manifold PDF: − ( x − µ ) 2 1 � � p ( x | µ , σ ) = √ exp 2 σ 2 2 πσ Line element: d s 2 = d µ 2 σ 2 + 2d σ 2 σ 2 Hyperbolic geometry: Poincaré half-plane, Poincaré disk, Klein disk, . . . Analytic geodesic distance ❤tt♣s✿✴✴✇✇✇✳②♦✉t✉❜❡✳❝♦♠✴✇❛t❝❤❄✈❂✐✾■❯③◆①❡❍✹♦
The pseudosphere (tractroid) Original Compressed
Geodesics on the Gaussian manifold
Data visualization with uncertainty Plasma energy confinement time w.r.t. global plasma parameters Euclidean Geodesic
Overview Stochastic uncertainty in fusion plasmas 1 Pattern recognition in probabilistic spaces 2 Geodesic least squares regression 3 Application in fusion science: edge-localized plasma instabilities 4 Application in astronomy: Tully-Fisher scaling 5 Conclusion 6
Challenges in regression analysis Data uncertainty: measurement error, fluctuations, . . . Model uncertainty: missing variables, linear vs. nonlinear, Gaussian vs. non-Gaussian, . . . Heterogeneous data and error bars Uncertainty on response ( y ) and predictor ( x j ) variables Atypical observations (outliers) Near-collinearity of predictor variables Data transformations, e.g. ln ( y ) = ln ( β 0 ) + β 1 ln ( x 1 ) + β 2 ln ( x 2 ) + . . . + β p ln ( x p )
Least squares and maximum a posteriori Workhorse: ordinary least squares (OLS) Maximum likelihood (ML) / maximum a posteriori (MAP): � y i − µ i � 2 1 − 1 p ( y i | x i , θ ) = √ exp 2 σ 2 πσ e.g. µ i = f i ( x i , θ ) = β 0 + β 1 x i Michigan, circa 1890s. Need flexible and robust regression Parameter estimation → distance minimization: Expected ↔ Measured
The minimum distance approach Minimum distance estimation (Wolfowitz, 1952): Which distribution does the model predict? vs. Which distribution do you observe? Gaussian case: different means and standard deviations Hellinger divergence (Beran, 1977) Empirical distribution: kernel density estimate
Modeled and observed distribution
Example: fluid turbulence
Geodesic least squares �� 2 � � β 0 + ∑ m y − j = 1 β j x ij 1 − 1 � exp � 2 2 σ 2 y + ∑ m � σ 2 j = 1 β j 2 σ 2 σ 2 y + ∑ m 2 π j = 1 β j x , j x , j σ 2 mod Rao GD Modeled distribution Observed distribution � � ( y − y i ) 2 1 − 1 √ exp σ obs 2 2 2 π σ obs Model-based approach: regression on probabilistic manifold To be estimated: σ obs , β 0 , β 1 , . . . , β m iid data: minimize sum of squared GDs = ⇒ geodesic least squares (GLS) regression If σ mod = σ obs → Mahalanobis distance G. Verdoolaege et al. , Entropy 17 , 4602, 2015
Overview Stochastic uncertainty in fusion plasmas 1 Pattern recognition in probabilistic spaces 2 Geodesic least squares regression 3 Application in fusion science: edge-localized plasma instabilities 4 Application in astronomy: Tully-Fisher scaling 5 Conclusion 6
Edge-localized modes (ELMs) Repetitive instabilities in plasma edge Magnetohydrodynamic origin MAST, Culham Centre for Fusion Energy, UK
Analogy 1: Solar flares
Analogy 2: Cooking pot
Importance of ELMs Confinement loss Potential damaging effects Impurity outflux → ELM control/mitigation Energy ∝ (frequency) − 1
Data extraction: waiting times 32 recent JET discharges Waiting time: time before ELM burst
Data extraction: energies Energy carried from the plasma by an ELM
Average waiting times and energies
Error bars on averages Standard deviation / √ n → error bars
Regression on averages E ELM = β 0 + β 1 ∆ t ELM , σ E ,obs ∝ µ E ,obs
Regression results on pseudosphere
Projected regression results Multidimensional scaling:
Average vs. collective trend Average Individual Method β 0 (MJ) β 1 (MJ/s) Method β 0 (MJ) β 1 (MJ/s) OLS 0.024 3.2 OLS -0.050 5.7 GLS -0.022 4.2 GLS -0.021 4.6
Overview Stochastic uncertainty in fusion plasmas 1 Pattern recognition in probabilistic spaces 2 Geodesic least squares regression 3 Application in fusion science: edge-localized plasma instabilities 4 Application in astronomy: Tully-Fisher scaling 5 Conclusion 6
Baryonic Tully-Fisher Relation (BTFR) Simple, tight relation for disk galaxies: � M b = total (stellar + gaseous) baryonic mass ( M ⊙ ) M b = β 0 V β 1 f V f = rotational velocity (km s − 1 ) Various purposes: Distance indicator Constraints on galaxy formation models Test for alternatives to Λ CDM cosmological model (slope and scatter)
Experiments 47 gas-rich galaxies (McGaugh, Astron. J. 143 , 40, 2012) Loglinear ( σ obs, i ≡ s obs ) and nonlinear ( σ obs, i = r obs M b ) Benchmarking: Ordinary least squares (OLS) Bayesian: errors in all variables, marginalized standard deviations (Bayes) Geodesic least squares (GLS) Kullback-Leibler least squares (KLS)
Loglinear regression
Nonlinear regression
Parameter distributions
GLS uncertainty estimates r M b ≈ 38%, r obs ≈ 63%
Interpretation on pseudosphere
Overview Stochastic uncertainty in fusion plasmas 1 Pattern recognition in probabilistic spaces 2 Geodesic least squares regression 3 Application in fusion science: edge-localized plasma instabilities 4 Application in astronomy: Tully-Fisher scaling 5 Conclusion 6
Conclusions Probabilistic modeling of stochastic system properties Information geometry: distance measure, geometrical intuition Pattern recognition in probabilistic spaces More information, more flexibility Geodesic least squares regression: flexible and robust Easy to use, fast optimization
Recommend
More recommend