structure learning for ctbn s
play

Structure learning for CTBNs Blazej Miasojedow Institute of Applied - PowerPoint PPT Presentation

Structure learning for CTBNs Structure learning for CTBNs Blazej Miasojedow Institute of Applied Mathematics and Mechanics, University of Warsaw 05 June 2020 1 1 Based on joint works with Wojciech Niemiro (Warsaw/Torun), Wojciech Rejchel


  1. Structure learning for CTBN’s Structure learning for CTBN’s Blazej Miasojedow Institute of Applied Mathematics and Mechanics, University of Warsaw 05 June 2020 1 1 Based on joint works with Wojciech Niemiro (Warsaw/Torun), Wojciech Rejchel (Torun), Maryia Shpak (Lublin) Blazej Miasojedow (UW) 05 June 2020 1 / 24

  2. Structure learning for CTBN’s Outline Outline CTBN 1 Structure learning 2 Full observations Partial observations Blazej Miasojedow (UW) 05 June 2020 2 / 24

  3. Structure learning for CTBN’s Outline Outline CTBN 1 Structure learning 2 Full observations Partial observations Blazej Miasojedow (UW) 05 June 2020 2 / 24

  4. Structure learning for CTBN’s CTBN CTBN 1 Structure learning 2 Full observations Partial observations Blazej Miasojedow (UW) 05 June 2020 3 / 24

  5. Structure learning for CTBN’s CTBN Continuous time Bayesian networks X ( t ) multivariate Markov jump process on state X = � v ∈ V X v where: ( V , E ) is a directed graph with possible cycles describing dependence structure. X v space of possible values at node v , assumed to be discrete. Intensity matrix Q given by conditional intensities � Q v ( x pa ( v ) , x v , x v ′ ) if x − v = x − v ′ and x v � = x v ′ for some v ; Q ( x , x ′ ) = 0 if x − v � = x − v ′ for all v , where pa ( v ) denotes the set of parents of node v in the graph ( V , E ) . Blazej Miasojedow (UW) 05 June 2020 4 / 24

  6. Structure learning for CTBN’s CTBN Continuous time Bayesian networks X ( t ) multivariate Markov jump process on state X = � v ∈ V X v where: ( V , E ) is a directed graph with possible cycles describing dependence structure. X v space of possible values at node v , assumed to be discrete. Intensity matrix Q given by conditional intensities � Q v ( x pa ( v ) , x v , x v ′ ) if x − v = x − v ′ and x v � = x v ′ for some v ; Q ( x , x ′ ) = 0 if x − v � = x − v ′ for all v , where pa ( v ) denotes the set of parents of node v in the graph ( V , E ) . Blazej Miasojedow (UW) 05 June 2020 4 / 24

  7. Structure learning for CTBN’s CTBN Example Blazej Miasojedow (UW) 05 June 2020 5 / 24

  8. Structure learning for CTBN’s CTBN Probability densities of CTBNs Density can be expressed as a product of conditional densities � p ( X ) = ν ( x ( 0 )) p ( X v � X pa ( v ) ) , v ∈ V with � � Q v ( c ; a , a ′ ) n T v ( c ; a , a ′ ) � � � p ( X v � X pa ( v ) ) = c ∈X pa ( v ) a ∈X v a ′∈X v a ′� = a � � � � � � − Q v ( c ; a ) t T exp v ( c ; a ) , c ∈X pa ( v ) a ∈X v v ( c ; a , a ′ ) be a number of those jumps from a to a ′ at node v , n T which occurred when the parent nodes configuration was c . t T v ( c ; a ) be the length of time when the state of node v was a and the configuration of the parents was c . Blazej Miasojedow (UW) 05 June 2020 6 / 24

  9. Structure learning for CTBN’s Structure learning CTBN 1 Structure learning 2 Full observations Partial observations Blazej Miasojedow (UW) 05 June 2020 7 / 24

  10. Structure learning for CTBN’s Structure learning Structure learning Based on observation we want to reconstruct the structure of graph and further estimate conditional intensities matrices. We consider two cases Full trajectory is observed. 1 We observe trajectories only in fixed time points t obs 1 , . . . , t obs 2 k with some noise. Blazej Miasojedow (UW) 05 June 2020 8 / 24

  11. Structure learning for CTBN’s Structure learning Connections with standard Bayesian networks Bayesian networks: consist from inpdependent observations, but graph needs to be acyclic. CTBN: dependent observation (Markovian process), no restrictions for graph. Easier to formulate thev structure learning problem for CTBNs. No restrictions are required. Analysis of methods is more demanding for CTBNs. We need to deal with Markov Jump Processes. Blazej Miasojedow (UW) 05 June 2020 9 / 24

  12. Structure learning for CTBN’s Structure learning Connections with standard Bayesian networks Bayesian networks: consist from inpdependent observations, but graph needs to be acyclic. CTBN: dependent observation (Markovian process), no restrictions for graph. Easier to formulate thev structure learning problem for CTBNs. No restrictions are required. Analysis of methods is more demanding for CTBNs. We need to deal with Markov Jump Processes. Blazej Miasojedow (UW) 05 June 2020 9 / 24

  13. Structure learning for CTBN’s Structure learning Connections with standard Bayesian networks Bayesian networks: consist from inpdependent observations, but graph needs to be acyclic. CTBN: dependent observation (Markovian process), no restrictions for graph. Easier to formulate thev structure learning problem for CTBNs. No restrictions are required. Analysis of methods is more demanding for CTBNs. We need to deal with Markov Jump Processes. Blazej Miasojedow (UW) 05 June 2020 9 / 24

  14. Structure learning for CTBN’s Structure learning Connections with standard Bayesian networks Bayesian networks: consist from inpdependent observations, but graph needs to be acyclic. CTBN: dependent observation (Markovian process), no restrictions for graph. Easier to formulate thev structure learning problem for CTBNs. No restrictions are required. Analysis of methods is more demanding for CTBNs. We need to deal with Markov Jump Processes. Blazej Miasojedow (UW) 05 June 2020 9 / 24

  15. Structure learning for CTBN’s Structure learning Existing approaches Search and score strategy, based on full Bayesian model Nodelman (2007); Acerbi et al. (2014). Mean field approximation combined with variational inference Linzner and Koeppl (2018). Estimating parameters for full graph in Bayesian setting and removing edges based on marginal posterior probabilities Linzner et al. (2019). Blazej Miasojedow (UW) 05 June 2020 10 / 24

  16. Structure learning for CTBN’s Structure learning Full observations Full observation Idea: Start with full model. 1 Express 2 log( Q v ( c , a , a ′ )) = β T Z ( c ) , β is vector of unknown parameter and Z ( c ) is a vector of dummy variables decoding configuration of all nodes except v . Estimate sparse β by Lasso 3 arg min {− ℓ ( β ) + λ � β � 1 } , β where ℓ is a likelihood given by s , s ′ T Z w ( c )) � � � � n w ( c ; s , s ′ ) β w s , s ′ Z w ( c ) − t w ( c ; s ) exp( β w ℓ ( β ) = w ∈V c ∈X − w s ∈X w s ′∈X w s ′� = s (1) Blazej Miasojedow (UW) 05 June 2020 11 / 24

  17. Structure learning for CTBN’s Structure learning Full observations Example We consider a binary CTBN with three nodes A , B and C . For the node A we define the function Z A as Z A ( b , c ) = [ 1 , I ( b = 1 ) , I ( c = 1 )] ⊤ and β is defined as follows � ⊤ . β A 0 , 1 , β A 1 , 0 , β B 0 , 1 , β B 1 , 0 , β C 0 , 1 , β C � β = 1 , 0 With slight abuse of notation, the vector β A 0 , 1 is given as � ⊤ . β A β A 0 , 1 ( 1 ) , β A 0 , 1 ( B ) , β A � 0 , 1 = 0 , 1 ( C ) Blazej Miasojedow (UW) 05 June 2020 12 / 24

  18. Structure learning for CTBN’s Structure learning Full observations Connection between parametrization and structure In our setting identifying edges in the graph is equivalent to finding non-zero elements of β β w 0 , 1 ( u ) � = 0 or β w 1 , 0 ( u ) � = 0 ⇔ the edge u → w exists . Blazej Miasojedow (UW) 05 June 2020 13 / 24

  19. Structure learning for CTBN’s Structure learning Full observations Notation and assumptions d 0 = | supp ( β ) | , S = supp ( β ) , C ( ξ ) = { θ : | θ S C | 1 ≤ ξ | θ S | 1 } for some ξ > 1, β min = min k | β k | � 2 � � � β w ⊤ θ w ⊤ exp s , s ′ Z w ( c S w , 0 ) s , s ′ Z w ( c S w , 0 ) � � � F ( ξ ) = inf | θ S | 1 | θ | ∞ 0 � = θ ∈ C ( ξ, S ) w ∈V s ′ � = s c Sw ∈X Sw (2) We assume that F ( ξ ) > 0 for some ξ > 1 ∆ = max s � = s ′ Q ( s , s ′ ) Blazej Miasojedow (UW) 05 June 2020 14 / 24

  20. Structure learning for CTBN’s Structure learning Full observations Main result Theorem 1 (Shpak,Rejchel,BM 2020) Let ε ∈ ( 0 , 1 ) , ξ > 1 be arbitrary. Suppose that F ( ξ ) defined in (2) is positive and � � 36 (max w ∈V | S w | + 1 ) log 2 + log ( d || ν || 2 /ε ) T > . (3) π 2 ( s , c Sw , 0 ) ρ 1 min w ∈V , s ∈X w , cSw ∈X Sw We also assume that T ∆ ≥ 2 and � 2 ξ + 1 ∆ 2 ζ F ( ξ ) ξ − 1 log( K /ε ) T ≤ λ ≤ e ( ξ + 1 ) | S | , (4) where K = 2 ( 2 + e 2 ) d ( d − 1 ) and ζ = min π ( s , c Sw , 0 ) / 2 . w ∈V , s ∈X w , cSw ∈X Sw Then with probability at least 1 − 2 ε we have 2 e ξλ | ˆ β − β | ∞ ≤ ( ξ + 1 ) ζ F ( ξ ) . (5) Blazej Miasojedow (UW) 05 June 2020 15 / 24

  21. Structure learning for CTBN’s Structure learning Full observations Consistency of model selection Corollary 2 Let R denote the right-hand side of the inequality (5) . Consider the thresholded Lasso estimator with the set of nonzero coordinates ˆ S . The set ˆ S contains only those coefficients of the Lasso estimator , which are larger in the absolute value than a pre-specified threshold δ. If β min / 2 > δ ≥ R , then � � ˆ P S = S ≥ 1 − 2 ε . Blazej Miasojedow (UW) 05 June 2020 16 / 24

Recommend


More recommend