the orlicz sobolev gauss exponential manifold
play

The Orlicz-Sobolev-Gauss Exponential Manifold Giovanni Pistone - PowerPoint PPT Presentation

IGAIA IV Information Geometry and its Applications IV The Orlicz-Sobolev-Gauss Exponential Manifold Giovanni Pistone www.giannidiorestino.it Liblice June 13 2016 My four parts 1. Amaris Information Geometry when the state space is not


  1. IGAIA IV Information Geometry and its Applications IV The Orlicz-Sobolev-Gauss Exponential Manifold Giovanni Pistone www.giannidiorestino.it Liblice June 13 2016

  2. My four parts 1. Amari’s Information Geometry when the state space is not finite and the model is not parametric 2. An example: computing the Wasserstein’s distance 3. Gauss-Orlicz-Sobolev model spaces 4. Second order geometry Cette conversation est d´ edi´ ee ` a Michel Metivier, mon maitre ` a Rennes (1973-75)

  3. Part I Amari’s Information Geometry when the state space is not finite and the model in not parametric

  4. In IG the velocity is the score • θ �→ p θ is a curve d • The score θ �→ d θ log p θ is an estimating function because � d � d θ log p θ = 0 E θ • Fisher-Rao computation: � d U ( x ) d d θ E θ [ U ] = d θ p ( x ; θ ) µ ( dx ) p ( x ; θ ) > 0 � U ( x ) d = d θ log p ( x ; θ ) p ( x ; θ ) µ ( dx ) = � � U − E θ [ U ] , d = d θ log p θ θ • U − E θ [ U ] is the statistical gradient of θ �→ E θ [ U ]. • Cf recent work by Ay, Jost, Lˆ e, Schwachh¨ ofer on measure models

  5. IG is the geometry of the statistical bundle • P is a set of probabilities on a given sample space (Ω , F ) → L 1 • For each p ∈ P , B p ֒ 0 ( p ) • A statistical bundle is T P = { ( p , U ) | p ∈ P , U ∈ B p } • We expect the fibers B p to be isomorphic and express a tangent space at p ∈ P • A chart at p σ p : ( q , V ) �→ ( s p ( q ) , ˙ s p ( V )) ∈ B p × B p • S.-i. Amari and M. Kumon. Estimation in the presence of infinitely many nuisance parameters—geometry of estimating functions. Ann. Statist. , 16(3):1044–1068, 1988 • P. Gibilisco and G. Pistone. Connections on non-parametric statistical manifolds by Orlicz space geometry. IDAQP , 1(2):325–347, 1998 • Cf Otto, cf Lˆ e

  6. Fibers: B p = L (cosh − 1) ( p ) • The exponential space L (cosh − 1) ( p ) and the mixture space L (cosh − 1) ∗ ( p ) are the Orlicz spaces respectively defined by the conjugate Young functions (cosh − 1)( x ) = cosh x − 1 • with xy � (cosh − 1)( x ) + (cosh − 1) ∗ ( y ) • The closed unit balls of the exponential and mixture space are, respectively, � � � � � � � � � � f � � f � L (cosh − 1) ( p ) � 1 = f (cosh − 1)( f ( x )) p ( x ) dx � 1 � � � � � � � � � � � g � � � � L (cosh − 1) ∗ ( p ) 1 = g (cosh − 1) ∗ ( g ( x )) p ( x ) dx � 1 . � � � � U ∈ L (cosh − 1) ( p ) � E p [ U ] = 0 • B p = is the dual of � � � ∗ B p = V ∈ L (cosh − 1) ∗ ( p ) � E p [ V ] = 0

  7. B p is the space of scores B p is exactly the space of scores of Gibbs model through p Theorem 1. U ∈ B p iff E p [ U ] = 0 and E p [(cosh − 1)( ρ U )] < ∞ for some ρ > 0 2. U ∈ B p iff E p [ U ] = 0 and the moment generating function � e θ U � α �→ E p is finite in a neighbourhood of 0 e θ U 3. The Gibbs model θ �→ E p [ e θ U ] is defined in a neighborhood of 0 and � e θ U �� � d θ =0 = 0 . d θ E p � � d 4. The score of the Gibbs model at 0 is d θ log p θ θ =0 = U This set-up applies to the set P > of strictly positive densities. • G. Pistone and C. Sempi. An infinite-dimensional geometric structure on the space of all the probability measures equivalent to a given one. Ann. Statist. , 23(5):1543–1561, October 1995

  8. Isomorphism of the L (cosh − 1) ( p ) spaces Theorem � L (cosh − 1) ( p ) = L (cosh − 1) ( q ) as Banach spaces if θ �→ p 1 − θ q θ d µ is finite on an open neighbourhood I of [0 , 1] , i.e. It is an equivalence relation p ⌣ q and we denote by E ( p ) the class containing p. Proof. Assume U ∈ L (cosh − 1) ( p ) and consider the restrictions to the axes of the convex function � � � � sU + θ log q e sU p 1 − θ q θ d µ = ( s , θ ) �→ exp p d µ p • G. Pistone and C. Sempi. An infinite-dimensional geometric structure on the space of all the probability measures equivalent to a given one. Ann. Statist. , 23(5):1543–1561, October 1995 • A. Cena. Geometric structures on the non-parametric statistical manifold . PhD thesis, Dottorato in Matematica, Universit` a di Milano, 2002

  9. Portmanteau theorem Theorem The following statements are equivalent for p , q ∈ P > : • q ∈ E ( p ) ; • p ⌣ q; • E ( p ) = E ( q ) ; • L (cosh − 1) ( p ) = L (cosh − 1) ( q ) ; � � q ∈ L cosh − 1 ( p ) ∩ L cosh − 1 ( q ) . • log p q p ∈ L 1+ ǫ ( p ) and p • q ∈ L 1+ ǫ ( q ) for some ǫ > 0 . • A. Cena and G. Pistone. Exponential statistical manifold. Ann. Inst. Statist. Math. , 59(1):27–56, 2007 • M. Santacroce, P. Siri, and B. Trivellato. New results on mixture and exponential models by Orlicz spaces. Bernoulli , 22(3):1431–1447, 2016

  10. Maximal exponential family • For each p ∈ P > , the moment generating functional is the positive � e U � lower-semi-continuous convex function G p : B p ∋ U �→ E p and • the cumulant generating functional is the non-negative lower semicontinuous convex function K p = log G p . • The interior of the proper domain � � � ◦ � U ∈ L (cosh − 1) ( p ) S p = � G p ( U ) < + ∞ is an open convex set containing the open unit ball of L (cosh − 1) ( p ). • For each p ∈ P > , the maximal exponential family at p is � � � � e u − K p ( u ) · p E ( p ) = � u ∈ S p . From now on the maximal exponential family of interest is the family of the Maxwell density on R n , E ( M )

  11. e-chart at p ∈ E ( M ) • For each p ∈ E ( M ) we define a chart s p : E ( M ) → S p ⊂ B p . • The chart is defined by � q � � q � � � q �� s p ( q ) �→ log + D ( p � q ) = log − E p log p p p • The inverse of the chart e − 1 = s p : S p → E ( M ) is p e p ( U ) = exp ( U − K p ( U )) · p • { s p | p ∈ E ( M ) } is an affine atlas on E ( M ) that defines the exponential manifold • The information closure of any E ( M ) is P � . The reverse information closure of any E ( M ) is P > . • I. Csisz´ ar and F. Mat´ uˇ s. Information projections revisited. IEEE Trans. Inform. Theory , 49(6):1474–1490, 2003 • D. Imparato and B. Trivellato. Geometry of extended exponential models. In Algebraic and geometric methods in statistics , pages 307–326. Cambridge Univ. Press, Cambridge, 2010

  12. e-chart at ( p , U ) ∈ T E ( M ) • A curve t �→ p ( t ), p (0) = p in the exponential manifold E ( M ) is expressed in the chart s p as p ( t ) = e U ( t ) − K p ( U ( t )) · p . � • The expression of the velocity at t = 0 is ˙ � d U (0) = dt log p ( t ) t =0 • It follows that the exponential bundle T E ( M ) = { ( p , U ) | p ∈ E ( M ) , U ∈ B p } is the expression of the tangent bundle of the exponential manifold • The transition map s p 2 ◦ e p 1 : S p 1 → S p 2 is affine with derivative e U p 2 p 1 : B p 1 → B p 2 given by e U p 2 p 1 U = U − E p 2 [ U ] • We define an atlas of charts on T E ( M ) by � � s p ( q ) , e U p σ p ( q , V ) = q V • G. Pistone. Nonparametric information geometry. In F. Nielsen and F. Barbaresco, editors, Geometric science of information , volume 8085 of Lecture Notes in Comput. Sci. , pages 5–36. Springer, Heidelberg, 2013. First International Conference, GSI 2013 Paris, France, August 28-30, 2013 Proceedings

  13. Cumulant functional • The r-divergence q �→ D ( p � q ) is represented in the chart centered � e U � at p by D ( p � e p ( U )) = K p ( U ) = log E p . • K p : B p → R � ∪ { + ∞} is convex and its proper domain contains the open unit ball of B p . It is infinitely Gˆ ateaux-differentiable on the interior S p of its proper domain and analytic on the unit ball of B p . • For all V , V 1 , V 2 , V 3 ∈ B p the first derivatives are: d K p ( U )[ V ] = E q [ V ] d 2 K p ( U )[ V 1 , V 2 ] = Cov q ( V 1 , V 2 ) d 3 K p ( U )[ V 1 , V 2 , V 3 ] = Cov q ( V 1 , V 2 , V 3 ) • G. Pistone and M. Rogantin. The exponential statistical manifold: mean parameters, orthogonality and space transformations. Bernoulli , 5(4):721–760, August 1999 • A. Cena and G. Pistone. Exponential statistical manifold. Ann. Inst. Statist. Math. , 59(1):27–56, 2007

  14. Pre-dual statistical bundle • Recall L (cosh − 1) ∗ ( M ) is the pre-dual of L (cosh − 1) ( M ) • Define the pre-dual statistical bundle with fibers � � � V ∈ L (cosh − 1) ∗ ( M ) � E p [ V ] = 0 ∗ B p = . • Compute the adjoint of the transport e U q p . For U ∈ B p and V ∈ ∗ B q , � e U q � p U , V q = � U − E q [ U ] , V � q = E q [ UV ] � q � � � � � U , q U , m U p = E p p UV = p V = q V p p • Define the charts on ∗ T E ( M ) by � � s p ( q ) , m U p σ ∗ p ( q , W ) = q W

Recommend


More recommend