IGAIA 4 Bohemia Information Geometry ーー Historical Episodes and Future with Recent Developments Shun ‐ ichi Amari RIKEN Brain Science Institute
Prehistory ‐‐‐ Riemannian Geometry H. Hotteling 1929 Riemannian metric and Fisher information location ‐ scale model : constant curvature P. Ch. Maharanobis 1936 Euclidean distance (multivariate ‐ Gaussian) C. R. Rao 1945 Cramer ‐ Rao Theorem; Riemannian H. Jeffreys 1946 Bayesian theory and Jeffreys invariant prior
Dual Geometry, Invariance N. Chentsov 1972 invariance, {g, T}, α‐ connection B. Efron 1975 (A. P. Dawid) statistical curvature; higher ‐ order asymptotics O. Barndorff ‐ Nielsen 1976 exponential family; Legendre transform S. Amari 1982 duality; curvature and statistics (M. Kumon) H. Nagaoka and S. Amari 1982 duality, Pythagorean theorem
Amari’s personal history 1958: statistics seminar (master course at U Tokyo) S. Kullback, “Information and Statistics” Riemannian metric (suggested by S. Moriguti) N Gausssian : geodesic, constant curvature (Poincare ‐ half plane) 2 ( , ) beautiful structure essential meaning? mathematical engineering graph and topology of networks: homology non ‐ Riemannian geometry of materials manifold: dislocations information systems, learning and neural networks
Statistical curvature and higher ‐ order inference B. Efron, 1975 Fisher’s idea; exponential connection and mixture connection A. P. Dawid, 1975 e ‐ and m ‐ connections 1 1 1 S. Amari : α‐ geometry 2 1 2 2 2 ( ) Error G H H K e m m 2 3 n n n (Rao, Kano K? Fisher’s dream) Amari and M.Kumon higher ‐ order power of statistical test
Amari paper : Ann. Statist. 1982 : Reviewers (S. Lauritzen and A.P. Dawid) Chentsov work (handwritten manuscript) H. Nagaoka and S. Amari 1982 (Technical Report) Ann. Probability Theory 7 reviewers Zeitshrift fur Wahrsceinlichkeitstheorie und VerwandteGebiete geometry has nothing to do with statistics IEEE Trans. Inf. Theory Shannon Theory, now well ‐ known
London Workshop: 1984 (D. Cox) Cox visited Japan in 1983 patron of information geometry Rao, Efron, Dawid, Barndorff ‐ Nielsen, Lauritzen Kass, Eguchi many others Dodson, Critchley, Marriot, Komaki, Zhang, Ay, Pistone, Giblisco, Nielsen, …
Information Geometry ‐‐‐ lucky naming Applications area: statistics, time ‐ series and systems, machine learning, signal processing, optimization brain theory, consciousness physics, economics, mathematics (Banach manifold, affine differential geometry and beyond) quantum information, Tsallis entropy
International Conferences IGAIA series; GSIS series, … Many monographs new journal (Jun Zhang); where to publish mailing list and society still small community; united and cooperative, blessed by all
My recent works 1. Systems complexity and consciousness (IIT) 2. Geometry of score matching (Hyvarinen score) 3. Natural gradient descent and topology of deep learning) 4. Canonical divergence 5. Multi ‐ terminal statistical inference 6. Information geometry and Wasserstein distance
Information Integration and Complexity of Systems x y 1 1 x y ‐‐ Stochastic approach 2 2 x y x y x ( , ) ( ) ( | ) p p p x: state of the brain y: next state of the brain
Integrated Information Theory G. Tononi Φ Necessary condition; sufficient?
x y x y 1 1 1 1 x y x y 2 2 2 2 Disconnected model: full model: x y { ( , )} S p F x y y x { ( , )} ( | ) ( | ) S q q q y x i i dis measure of interaction : N. Ay information integration : Tononi Barrett and Seth Many other
Measure of information integration, or system complexity Information Geometry N. Ay
Definition of : Postulates 1) min [ ( , ) : ( , )], D p x y q x y q S dis q ( , ) p x y 2) [ : ] ( , )log D D p q p x y KL ( , ) q x y � ��� 3) Disconnected model: Markov conditions
x y Markov Condition 1 1 x y 2 2 x x y (1 2) branch deleted: Markov condition: 1 2 2 ( , | ) ( | ) ( | ) p x y x p x x p y x 1 2 2 1 2 2 2 X X Y 1 2 2 X X Y 2 1 1 : all ( ) deleted S x y i j dis i j
Why KL ‐ divergence? D[p :q] 0, = 0, when and only when p = q 1) ���: �� ��������� ����� ��������������� �� x 2) D[p :q] = d[p(x ), q(x )] 3) i i i ���: �� induces flat structure dually 4)
Geometric degree of information integration min [ ( , ) : ( , )], D p x y q x y q S geo KL dis q
x y Gaussian case 1 1 A x y 2 2 y x e ee T =E[ ] A y x e e e T ' ' '=E[ ' ' ] ':diagonal A A | '| log | geo |
x y Gaussian case 1 1 A x y 2 2 y x e ee T =E[ ] A y x e e e T ' ' '=E[ ' ' ] ':diagonal A A | '| log | geo |
Many other definitions of Φ Full model Disconnected models
Full model : graphical model S F y x x y X Y X Y XY , exp p x y x x y y x y 12 1 2 12 1 2 i i i i ij i j higher-order terms exponential family ‐ coordinates , ‐ coordinates X XY , , E x E x y i i ij i j There are many disconnected models!!
Split Model : Ay, Barrett & Seth S H y x y x y x x , q q q y x i i XY XY Y 0 Y X X Y 12 21 12 1 1 2 2 : min : D p S D p q H KL H KL q S H y x ˆ ˆ : p q p q p y x i i M S S H Y X H Y X H ˆ q H i i
XY XY Y ˆ : 0; S 12 21 12 H q q y x Mixed Coordinates : Markovian Condition X X Y XY XY XY XY Y , , , , ; , , i 12 i 12 21 12 21 12 Y X X Y 1 1 2 2 ˆ x y ˆ , ; ; q p q X X Y Y X X 1 2 2 1 1 2 problem 0 : p I X Y x y 0, >0 p p p I ind X Y
Split Model S graphical model Gr x y x y , q q q q y x X Y i i XY XY 0 12 21 , , , , q x y x y q x x y q y x y 1 2 2 1 1 2 1 2 2 1 0 : I X Y x x y y ˆ ˆ , q p q p X Y Y Y ˆ q y x p y x i i i i
Problem: Gaussian channel x, y) y x ˆ ( : p A y x ˆ : q A S G 1 1 1 x y x x y x y x , exp p A A 2 X ˆ A is not diagonal XY XY 0 12 21 : A
y x ˆ Mismatched Decoding Model S M x y 1 1 Best mismatched decoding from y to x x y 2 2 x y x y { ( , ) ( ) ( ) (y | x ) } S q p p p M i i * [ : ] D p S KL M
, : , dually flat S S S S S H Gr Geo Gr H : , not flat S S S S M Gr M Geo , ; Gr H M Geo H geo
Transfer Entropy; Granger causality [ ] min [ : ] ( ) disconnected TE x y D p q q i j i j KL [ | ] [ | ] H Y X X H Y X j i j x y 1 1 x y 2 2 Non ‐ additive [ ; ] [ ] [ ] TE x y x y TE x y TE x y i j k m i j k m
Hierarchy: transfer entropy Partition of X cutting branches X Y split models x y , X X X X i i j , Y Y Y Y i i j subadditivity
Information Geometry of Hyvärinen Game Score Following Grinwald, Dawid, Parry, Lauritzen, Hyvärinen , , : L p q E l x a a q x p x , , , log S x q l x q l x q q x -entropy : , S H p q E S x q S p -divergence : : : S D p q H p q H p p S S S
Hyvärinen score 1 2 , , , S x q l x l x 2 1 d 2 : log log , D p q E p x q x S p 2 dx : : D p cq D p q s , , , , x l x l x l x
Recommend
More recommend