information geometry
play

Information Geometry Historical Episodes and Future with Recent - PowerPoint PPT Presentation

IGAIA 4 Bohemia Information Geometry Historical Episodes and Future with Recent Developments Shun ichi Amari RIKEN Brain Science Institute Prehistory Riemannian Geometry H. Hotteling 1929 Riemannian metric and Fisher


  1. IGAIA 4 Bohemia Information Geometry ーー Historical Episodes and Future with Recent Developments Shun ‐ ichi Amari RIKEN Brain Science Institute

  2. Prehistory ‐‐‐ Riemannian Geometry H. Hotteling 1929 Riemannian metric and Fisher information location ‐ scale model : constant curvature P. Ch. Maharanobis 1936 Euclidean distance (multivariate ‐ Gaussian) C. R. Rao 1945 Cramer ‐ Rao Theorem; Riemannian H. Jeffreys 1946 Bayesian theory and Jeffreys invariant prior

  3. Dual Geometry, Invariance N. Chentsov 1972 invariance, {g, T}, α‐ connection B. Efron 1975 (A. P. Dawid) statistical curvature; higher ‐ order asymptotics O. Barndorff ‐ Nielsen 1976 exponential family; Legendre transform S. Amari 1982 duality; curvature and statistics (M. Kumon) H. Nagaoka and S. Amari 1982 duality, Pythagorean theorem

  4. Amari’s personal history 1958: statistics seminar (master course at U Tokyo) S. Kullback, “Information and Statistics” Riemannian metric (suggested by S. Moriguti) N   Gausssian : geodesic, constant curvature (Poincare ‐ half plane) 2 ( , ) beautiful structure  essential meaning? mathematical engineering graph and topology of networks: homology non ‐ Riemannian geometry of materials manifold: dislocations information systems, learning and neural networks

  5. Statistical curvature and higher ‐ order inference B. Efron, 1975 Fisher’s idea; exponential connection and mixture connection A. P. Dawid, 1975 e ‐ and m ‐ connections 1 1 1        S. Amari : α‐ geometry 2 1 2 2 2 ( ) Error G H H K e m m 2 3 n n n (Rao, Kano K? Fisher’s dream) Amari and M.Kumon higher ‐ order power of statistical test

  6. Amari paper : Ann. Statist. 1982 : Reviewers (S. Lauritzen and A.P. Dawid) Chentsov work (handwritten manuscript) H. Nagaoka and S. Amari 1982 (Technical Report) Ann. Probability Theory 7 reviewers Zeitshrift fur Wahrsceinlichkeitstheorie und VerwandteGebiete geometry has nothing to do with statistics IEEE Trans. Inf. Theory Shannon Theory, now well ‐ known

  7. London Workshop: 1984 (D. Cox) Cox visited Japan in 1983 patron of information geometry Rao, Efron, Dawid, Barndorff ‐ Nielsen, Lauritzen Kass, Eguchi many others Dodson, Critchley, Marriot, Komaki, Zhang, Ay, Pistone, Giblisco, Nielsen, …

  8. Information Geometry ‐‐‐ lucky naming Applications area: statistics, time ‐ series and systems, machine learning, signal processing, optimization brain theory, consciousness physics, economics, mathematics (Banach manifold, affine differential geometry and beyond) quantum information, Tsallis entropy

  9. International Conferences IGAIA series; GSIS series, … Many monographs new journal (Jun Zhang); where to publish mailing list and society still small community; united and cooperative, blessed by all

  10. My recent works 1. Systems complexity and consciousness (IIT) 2. Geometry of score matching (Hyvarinen score) 3. Natural gradient descent and topology of deep learning) 4. Canonical divergence 5. Multi ‐ terminal statistical inference 6. Information geometry and Wasserstein distance

  11. Information Integration and Complexity of Systems x y 1 1 x y ‐‐ Stochastic approach 2 2  x y x y x ( , ) ( ) ( | ) p p p x: state of the brain y: next state of the brain

  12. Integrated Information Theory G. Tononi Φ Necessary condition; sufficient?

  13. x y x y 1 1 1 1 x y x y 2 2 2 2  Disconnected model: full model: x y { ( , )} S p F    x y y x { ( , )} ( | ) ( | ) S q q q y x i i dis measure of interaction : N. Ay information integration : Tononi Barrett and Seth  Many other

  14. Measure of information integration,  or system complexity Information Geometry N. Ay

  15.  Definition of : Postulates 1)    min [ ( , ) : ( , )], D p x y q x y q S dis q   ( , ) p x y 2)  [ : ] ( , )log D D p q p x y KL ( , ) q x y � ��� 3) Disconnected model: Markov conditions

  16. x y Markov Condition 1 1 x y 2 2   x x y (1  2) branch deleted: Markov condition: 1 2 2  ( , | ) ( | ) ( | ) p x y x p x x p y x 1 2 2 1 2 2 2   X X Y 1 2 2   X X Y 2 1 1   : all ( ) deleted S x y i j dis i j

  17. Why KL ‐ divergence?  D[p :q] 0, = 0, when and only when p = q 1) ���: �� ��������� ����� ��������������� �� x 2)  D[p :q] = d[p(x ), q(x )] 3) i i i ���: �� induces flat structure dually 4)

  18. Geometric degree of information integration    min [ ( , ) : ( , )], D p x y q x y q S geo KL dis q

  19. x y Gaussian case 1 1 A x y 2 2    y x e ee T =E[ ] A    y x e e e T ' ' '=E[ ' ' ] ':diagonal A A  | '|   log |  geo |

  20. x y Gaussian case 1 1 A x y 2 2    y x e ee T =E[ ] A    y x e e e T ' ' '=E[ ' ' ] ':diagonal A A  | '|   log |  geo |

  21. Many other definitions of Φ Full model Disconnected models

  22. Full model : graphical model S F y x                 x y X Y X Y XY , exp p x y x x y y x y 12 1 2 12 1 2 i i i i ij i j        higher-order terms exponential family   ‐ coordinates    ,   ‐ coordinates        X XY , , E x E x y   i i ij i j There are many disconnected models!!

  23. Split Model : Ay, Barrett & Seth S H y   x       y x y x x , q q q y x i i          XY XY Y 0 Y X X Y 12 21 12 1 1 2 2        : min : D p S D p q H KL H KL  q S H         y x ˆ ˆ : p q p q p y x i i M S S H         Y X H Y X   H   ˆ q H i i

  24.          XY XY Y ˆ : 0; S 12 21 12 H q q y x Mixed Coordinates :             Markovian Condition X X Y XY XY XY XY Y , , , , ; , , i 12 i 12 21 12 21 12     Y X X Y   1 1 2 2     ˆ     x y ˆ , ; ; q p q X X Y Y X X 1 2 2 1 1 2        problem 0 : p I X Y         x y 0, >0 p p p I ind X Y

  25. Split Model S graphical model Gr           x y x y , q q q q y x X Y i i     XY XY 0 12 21        , , , , q x y x y q x x y q y x y 1 2 2 1 1 2 1 2 2 1      0 : I X Y           x x y y ˆ ˆ , q p q p X Y Y Y      ˆ q y x p y x i i i i

  26. Problem: Gaussian channel    x, y) y x    ˆ    ( : p A y x ˆ : q A S G     1             1 1        x y x x y x y x , exp p A A      2 X      ˆ A is not diagonal XY XY 0 12 21 : A

  27.  y x ˆ Mismatched Decoding Model S M x y 1 1 Best mismatched decoding from y to x x y 2 2     x y x y { ( , ) ( ) ( ) (y | x ) } S q p p p   M i i   * [ : ] D p S KL M

  28.  , : , dually flat S S S S S H Gr Geo Gr H  : , not flat S S S S M Gr M Geo        , ; Gr H M Geo H geo

  29. Transfer Entropy; Granger causality     [ ] min [ : ] ( ) disconnected TE x y D p q q i j i j KL    [ | ] [ | ] H Y X X H Y X j i j x y 1 1 x y 2 2 Non ‐ additive       [ ; ] [ ] [ ] TE x y x y TE x y TE x y i j k m i j k m

  30. Hierarchy: transfer entropy Partition of X cutting branches X Y split models x y      , X X X X i i j      , Y Y Y Y i i j subadditivity

  31. Information Geometry of Hyvärinen Game Score Following Grinwald, Dawid, Parry, Lauritzen, Hyvärinen           , , : L p q E  l x a  a q x   p x           , , , log S x q l x q l x q q x        -entropy : , S H p q E  S x q  S p         -divergence : : : S D p q H p q H p p S S S

  32. Hyvärinen score   1         2     , , , S x q l x l x 2   1     d     2    : log log , D p q E p x q x   S p   2 dx      : : D p cq D p q                    s , , , , x l x l x l x  

Recommend


More recommend