pattern recognition in nuclear fusion data by means of
play

Pattern recognition in nuclear fusion data by means of geometric - PowerPoint PPT Presentation

Pattern recognition in nuclear fusion data by means of geometric methods in probabilistic spaces Geert Verdoolaege Department of Applied Physics, Ghent University, Ghent, Belgium Laboratory for Plasma Physics, Royal Military Academy


  1. Pattern recognition in nuclear fusion data by means of geometric methods in probabilistic spaces Geert Verdoolaege Department of Applied Physics, Ghent University, Ghent, Belgium Laboratory for Plasma Physics, Royal Military Academy (LPP–ERM/KMS), Brussels, Belgium ECEA 2017, November 21 – December 1, 2017

  2. Overview Stochastic uncertainty in fusion plasmas 1 Pattern recognition in probabilistic spaces 2 Geodesic least squares regression 3 Application in fusion science: edge-localized plasma instabilities 4 Application in astronomy: Tully-Fisher scaling 5 Conclusion 6

  3. Overview Stochastic uncertainty in fusion plasmas 1 Pattern recognition in probabilistic spaces 2 Geodesic least squares regression 3 Application in fusion science: edge-localized plasma instabilities 4 Application in astronomy: Tully-Fisher scaling 5 Conclusion 6

  4. Fusion energy ‘Star on earth’ Clean, safe, inexhaustible energy source Magnetic confinement fusion: tokamak, stellarator, . . . Confine hot hydrogen isotope plasma with magnetic fields ITER : next-generation international tokamak Complex physical system, turbulent transport Difficult to probe → uncertainty in measurements and models

  5. Uncertainty in fusion plasmas Sources of statistical uncertainty: Fluctuation of system properties Measurement noise Edge-localized modes (MAST) Plasma turbulence (PPPL) Confinement time vs. density (JET)

  6. Overview Stochastic uncertainty in fusion plasmas 1 Pattern recognition in probabilistic spaces 2 Geodesic least squares regression 3 Application in fusion science: edge-localized plasma instabilities 4 Application in astronomy: Tully-Fisher scaling 5 Conclusion 6

  7. Difference/distance between points Patterns ↔ distances

  8. Zooming in...

  9. Mahalanobis distance

  10. Information geometry Family of probability distributions → differentiable manifold Parameters = coordinates Metric tensor: Fisher information matrix Parametric probability model: p ( x | θ ) = ⇒ ∂ 2 � � g µν ( θ ) = − E ∂θ µ ∂θ ν ln p ( x | θ ) , µ , ν = 1, . . . , m θ = m -dimensional parameter vector Line element: d s 2 = g µν d θ µ d θ ν Minimum-length curve: geodesic Rao geodesic distance (GD)

  11. Pattern recognition in probabilistic spaces Pattern recognition: Classification, clustering Regression analysis Dimensionality reduction, visualization Observation/prediction (structureless number) → distribution (structured object) More information, more flexibility

  12. The univariate Gaussian manifold PDF: − ( x − µ ) 2 1 � � p ( x | µ , σ ) = √ exp 2 σ 2 2 πσ Line element: d s 2 = d µ 2 σ 2 + 2d σ 2 σ 2 Hyperbolic geometry: Poincaré half-plane, Poincaré disk, Klein disk, . . . Analytic geodesic distance ❤tt♣s✿✴✴✇✇✇✳②♦✉t✉❜❡✳❝♦♠✴✇❛t❝❤❄✈❂✐✾■❯③◆①❡❍✹♦

  13. The pseudosphere (tractroid) Original Compressed

  14. Geodesics on the Gaussian manifold

  15. Data visualization with uncertainty Plasma energy confinement time w.r.t. global plasma parameters Euclidean Geodesic

  16. Overview Stochastic uncertainty in fusion plasmas 1 Pattern recognition in probabilistic spaces 2 Geodesic least squares regression 3 Application in fusion science: edge-localized plasma instabilities 4 Application in astronomy: Tully-Fisher scaling 5 Conclusion 6

  17. Challenges in regression analysis Data uncertainty: measurement error, fluctuations, . . . Model uncertainty: missing variables, linear vs. nonlinear, Gaussian vs. non-Gaussian, . . . Heterogeneous data and error bars Uncertainty on response ( y ) and predictor ( x j ) variables Atypical observations (outliers) Near-collinearity of predictor variables Data transformations, e.g. ln ( y ) = ln ( β 0 ) + β 1 ln ( x 1 ) + β 2 ln ( x 2 ) + . . . + β p ln ( x p )

  18. Least squares and maximum a posteriori Workhorse: ordinary least squares (OLS) Maximum likelihood (ML) / maximum a posteriori (MAP): � y i − µ i  � 2  1  − 1 p ( y i | x i , θ ) = √ exp  2 σ 2 πσ e.g. µ i = f i ( x i , θ ) = β 0 + β 1 x i Michigan, circa 1890s. Need flexible and robust regression Parameter estimation → distance minimization: Expected ↔ Measured

  19. The minimum distance approach Minimum distance estimation (Wolfowitz, 1952): Which distribution does the model predict? vs. Which distribution do you observe? Gaussian case: different means and standard deviations Hellinger divergence (Beran, 1977) Empirical distribution: kernel density estimate

  20. Modeled and observed distribution

  21. Example: fluid turbulence

  22. Geodesic least squares   �� 2 � � β 0 + ∑ m  y − j = 1 β j x ij    1 − 1   � exp � 2 2 σ 2 y + ∑ m � σ 2 j = 1 β j 2 σ 2 σ 2 y + ∑ m   2 π j = 1 β j   x , j x , j   σ 2 mod Rao GD Modeled distribution Observed distribution � � ( y − y i ) 2 1 − 1 √ exp σ obs 2 2 2 π σ obs Model-based approach: regression on probabilistic manifold To be estimated: σ obs , β 0 , β 1 , . . . , β m iid data: minimize sum of squared GDs = ⇒ geodesic least squares (GLS) regression If σ mod = σ obs → Mahalanobis distance G. Verdoolaege et al. , Entropy 17 , 4602, 2015

  23. Overview Stochastic uncertainty in fusion plasmas 1 Pattern recognition in probabilistic spaces 2 Geodesic least squares regression 3 Application in fusion science: edge-localized plasma instabilities 4 Application in astronomy: Tully-Fisher scaling 5 Conclusion 6

  24. Edge-localized modes (ELMs) Repetitive instabilities in plasma edge Magnetohydrodynamic origin MAST, Culham Centre for Fusion Energy, UK

  25. Analogy 1: Solar flares

  26. Analogy 2: Cooking pot

  27. Importance of ELMs Confinement loss Potential damaging effects Impurity outflux → ELM control/mitigation Energy ∝ (frequency) − 1

  28. Data extraction: waiting times 32 recent JET discharges Waiting time: time before ELM burst

  29. Data extraction: energies Energy carried from the plasma by an ELM

  30. Average waiting times and energies

  31. Error bars on averages Standard deviation / √ n → error bars

  32. Regression on averages E ELM = β 0 + β 1 ∆ t ELM , σ E ,obs ∝ µ E ,obs

  33. Regression results on pseudosphere

  34. Projected regression results Multidimensional scaling:

  35. Average vs. collective trend Average Individual Method β 0 (MJ) β 1 (MJ/s) Method β 0 (MJ) β 1 (MJ/s) OLS 0.024 3.2 OLS -0.050 5.7 GLS -0.022 4.2 GLS -0.021 4.6

  36. Overview Stochastic uncertainty in fusion plasmas 1 Pattern recognition in probabilistic spaces 2 Geodesic least squares regression 3 Application in fusion science: edge-localized plasma instabilities 4 Application in astronomy: Tully-Fisher scaling 5 Conclusion 6

  37. Baryonic Tully-Fisher Relation (BTFR) Simple, tight relation for disk galaxies: � M b = total (stellar + gaseous) baryonic mass ( M ⊙ ) M b = β 0 V β 1 f V f = rotational velocity (km s − 1 ) Various purposes: Distance indicator Constraints on galaxy formation models Test for alternatives to Λ CDM cosmological model (slope and scatter)

  38. Experiments 47 gas-rich galaxies (McGaugh, Astron. J. 143 , 40, 2012) Loglinear ( σ obs, i ≡ s obs ) and nonlinear ( σ obs, i = r obs M b ) Benchmarking: Ordinary least squares (OLS) Bayesian: errors in all variables, marginalized standard deviations (Bayes) Geodesic least squares (GLS) Kullback-Leibler least squares (KLS)

  39. Loglinear regression

  40. Nonlinear regression

  41. Parameter distributions

  42. GLS uncertainty estimates r M b ≈ 38%, r obs ≈ 63%

  43. Interpretation on pseudosphere

  44. Overview Stochastic uncertainty in fusion plasmas 1 Pattern recognition in probabilistic spaces 2 Geodesic least squares regression 3 Application in fusion science: edge-localized plasma instabilities 4 Application in astronomy: Tully-Fisher scaling 5 Conclusion 6

  45. Conclusions Probabilistic modeling of stochastic system properties Information geometry: distance measure, geometrical intuition Pattern recognition in probabilistic spaces More information, more flexibility Geodesic least squares regression: flexible and robust Easy to use, fast optimization

Recommend


More recommend