seminar statistics for structures a graphical perspective
play

Seminar Statistics for structures A graphical perspective on - PowerPoint PPT Presentation

Seminar Statistics for structures A graphical perspective on Gauss-Markov process priors Moritz Schauer University of Amsterdam 1 / 26 Outline Midpoint displacement construction of a Brownian motion Corresponding Gaussian Markov


  1. Seminar “Statistics for structures” A graphical perspective on Gauss-Markov process priors Moritz Schauer University of Amsterdam 1 / 26

  2. Outline ◮ Midpoint displacement construction of a Brownian motion ◮ Corresponding Gaussian Markov random field ◮ Chordal graphs ◮ Sparse Cholesky decomposition ◮ Connection to inference of diffusion processes 2 / 26

  3. Mid-point displacement L´ evy-Ciesielski construction of a Brownian motion ( W t ) t ∈ [0 , 1] [1] 3 / 26

  4. Faber-Schauder basis Figure: Elements ψ l,k , 1 ≤ l ≤ 3 of the hierarchical (Faber-) Schauder basis 4 / 26

  5. Schauder basis functions A location and scale family based on the “hat” function � ( x ) = (2 x ) 1 [0 , 1 2 ) + 2( x − 1) 1 [ 1 2 , 1] k = 0 , . . . , 2 j − 1 − 1 ψ j,k ( x ) = � (2 j − 1 x − k ) , j ≥ 1 , 5 / 26

  6. Mid-point displacement II Start with Brownian motion bridge ( W t ) t ∈ [0 , 1] 2 j − 1 − 1 J � � W J = Z j,k ψ j,k j =1 k =0 W J – truncated Faber–Schauder expansion Z J = vec ( Z j,k , j ≤ J, 0 ≤ k < 2 j − 1 ) Z J – independent zero mean Gaussian random variables Z j,k = W 2 − j (2 k +1) − 1 2( W 2 − j +1 k + W 2 − j +1 ( k +1) ) 6 / 26

  7. Mid-point displacement II Start with mean zero Gauss–Markov process ( W t ) t ∈ [0 , 1] 2 j − 1 − 1 J � � W J = Z j,k ψ j,k j =1 k =0 W J – truncated Faber–Schauder expansion Z J = vec ( Z j,k , j ≤ J, 0 ≤ k < 2 j − 1 ) Z J – mean zero Gaussian vector with precision matrix Γ Z j,k = W 2 − j (2 k +1) − 1 2( W 2 − j +1 k + W 2 − j +1 ( k +1) ) 7 / 26

  8. Markov property Write ι := ( j, k ) , ι ′ = ( j ′ , k ′ ) In general ⊥ Z ι ′ | Z { ι,ι ′ } C Γ ι,ι ′ = 0 if Z ι ⊥ By the Markov property Γ ι,ι ′ = 0 ψ ι · ψ ι ′ ≡ 0 if 8 / 26

  9. Gaussian Markov random field A Gaussian vector ( Z 1 , . . . , Z n ) together with the graph G ( { 1 , . . . , n } , E ) where no edge in E between ι and ι ′ ⊥ Z ι ′ | Z { ι,ι ′ } C if Z ι ⊥ 9 / 26

  10. Chordal graph / Triangulated graph “A chordal graph is a graph in which all cycles of four or more vertices have a chord , which is an edge that is not part of the cycle but connects two vertices of the cycle.” 10 / 26

  11. Interval graph The open supports of ψ j,k form an interval graph on pairs ( j, k ) . Interval graphs are chordal graphs. In red a cycle of four vertices with a blue chord 1 1 An interval graph is the intersection graph of a family of intervals on the real line. Interval graphs are chordal graphs. 11 / 26

  12. Sampling from the prior ◮ Sample J ◮ Compute factorization SS ′ = Γ J ◮ Solve by backsubstitution L ′ Z = WN with WN – standard white noise Hence: How to find sparse factors? 12 / 26

  13. Perfect elimination ordering “A perfect elimination ordering in a graph is an ordering of the vertices of the graph such that, for each vertex v, v and the neighbors of v that occur after v in the order form a clique .” Example: (3 , 0) (3 , 1) (3 , 2) (3 , 4) (2 , 0) (2 , 1) (1 , 0) 13 / 26

  14. Ordering the columns and rows of Γ according to the perfect elimination ordering of the chordal graph: S is the sparse Cholesky factor of ˜ ˜ Γ     � � � � � � � � � � � � ˜ ˜     Γ = S = � � � �     � � � � � � � � � � � � � � � � � � � � � � � � � � � � Cholesky decomposition has no fill in! 14 / 26

  15. Exploiting hierarchical structure Order rows and columns of Γ according to the location of the maxima of ψ j,k . Γ has sparsity structure (3 , 0) (2 , 0) (3 , 1) (1 , 0) (3 , 2) (2 , 1) (3 , 3)   � � � � � � � � � �   Γ =  , � � � � � � �  � � � � � � � � � � Γ = SS ′ where   � � � � �   S =  . � � � � � � �  � � � � � 15 / 26

  16. Recursive sparsity pattern S 1 = ( s 11 )    2 J − 1 − 1 S J − 1 0 0  l S J = 1 S cl s cc S cr   2 J − 1 − 1 S J − 1  0 0 r 16 / 26

  17. Hierarchical back-substitution A hierarchical back-substitution problem of the form       0 0 S l X l B l  = S cl s cc S cr x c b c      0 0 S r X r B r � �� � ( m +1+ m ) × ( m +1+ m ) can be recursively solved by solving the back-substitution problems S l X l = B l , S r X r = B r and setting x c = s − 1 cc · ( b c − S cl X l − S cr X r ) 17 / 26

  18. Factorization in quasi linear time       A ′ A l 0 S l 0 0 S l S cr 0 cl  = 0 0 A cl a cc A cr S cl s cc S cr s cc      A ′ 0 A r 0 0 S r 0 S cr S r cr   S l S ′ S ′ l S cl 0 l S ′ s 2 cc + S cl S ′ cl + S cr S ′ S ′ = cl S l r S cr   cr S ′ S r S ′ 0 cr S r r Here A l = S l S ′ l and A r = S r S ′ r are two hierarchical factorization problems of level J − 1 , A l = S ′ cl S l and A r = S ′ cr S r are hierarchical back-substitution problems and � a cc − S cl S ′ cl + S cr S ′ s cc = cr . 18 / 26

  19. Approximative sparse inversion using nested dissection [2] 19 / 26

  20. Application: Nonparametric inference for diffusion process d X t = b 0 ( X t ) d t + d W t (1) Prior P ( J ≥ j ) ≥ C exp( − 2 j ) and 2 j − 1 − 1 J � � b = Z j,k ψ j,k j =1 k =0 M Ξ J ≥ pd Γ J ≥ pd m Ξ J 2 , Ξ J = diagm(2 − 2( j − 1) α , 1 ≤ j ≤ J, 0 ≤ k < 2 j − 1 ) where α = 1 20 / 26

  21. Gaussian inverse problem Likelihood � � T � T � b ( X t ) d X t − 1 b 2 ( X t ) d t p ( X | b ) = exp 2 0 0 � T ι = 1 , . . . , 2 J − 1 µ J ι = ψ ι ( X t ) d X t , 0 � T ι, ι ′ = 1 , . . . , 2 J − 1 . G J ι,ι ′ = ψ ι ( X t ) ψ ι ′ ( X t ) d t, 0 Γ J and G J have the same sparsity pattern 21 / 26

  22. Conjugate posterior For fix level J , Z J | J, X ∼ N (Σ J µ J , Σ J ) where Σ J = (Γ J + G J ) − 1 . On J a reversible jump algorithm can be used. 22 / 26

  23. Posterior contraction rates (periodic case) Besov norm, supremum norm for f = � � z j,k ψ j,k � 2 ( j − 1) α | z j,k | � f � α = sup � f � ∞ ≤ | z j,k | max k j ≥ 1 ,k j Sieves   2 j − 1 − 1 L   � � z j,k ψ j,k : 2 α ( j − 1) | z j,k | ≤ M, j, k = . . . B L,M =   j =1 k =0 Rate β β T − 1+2 β log( T ) β ≥ α 1+2 β 23 / 26

  24. Anderson’s lemma If X ∼ N (0 , Σ X ) and Y ∼ N (0 , Σ Y ) independent with Σ X ≤ pd Σ Y positive definite, then then for all symmetric convex sets P ( Y ∈ C ) ≤ P ( X ∈ C ) . 24 / 26

  25. Summary ◮ Midpoint displacement construction of Gauss-Markov processes ◮ Corresponding Gaussian Markov random field ◮ Chordal graphs and perfect elimination orderings ◮ Sparse Cholesky decomposition ◮ Rates for randomly truncated prior 25 / 26

  26. Image sources [1] http://math.stackexchange.com/questions/251856 /area-enclosed-by-2-dimensional-random-curve [2] http://kartoweb.itc.nl/geometrics/ reference%20surfaces/body.htm 26 / 26

Recommend


More recommend