Exact inference and learning for cumulative distribution functions - PowerPoint PPT Presentation

Exact inference and learning for cumulative distribution functions on loopy graphs Jim C. Huang, Nebojsa Jojic and Christopher Meek NIPS 2010 Presented by Jenny Lam

Previous work ◮ Cumulative distribution networks and the derivative-sum- product algorithm. Huang and Frey, 2008. UAI. ◮ Cumulative distribution networks: Inference, estimation and applications of graphical models for cumulative distribution functions. Huang, 2009. Ph.D. Thesis. ◮ Maximum-likelihood learning of cumulative distribution functions on graphs. Huang and Jojic, 2010. Journal of ML research.

Cumulative Distribution Network: definition A CDN G is a bipartite graph ( V , S , E ) where ◮ V is the set of variable nodes, ◮ S is the set of function nodes, with φ : R | N ( φ ) | → [0 , 1] is a CDF, ◮ E is the set of edges, connecting functions to their variables. � $$$$$$$$$$$$ # � � � � � # The joint CDF of this CDN is F ( x ) = � φ ∈ S φ . #

CDNs: what are they for? ◮ PDF models must enforce a normalization constraint. ◮ PDFs are made more tractable by restricting to, e.g., Gaussians. ◮ Many non-Gaussian distributions are conveniently parametrized as CDFs. ◮ CDNs can be used to model heavy-tailed distributions, which are important in climatology and epidemiology.

Mixed derivative of a product � ∂ x [ f · g ] = ∂ U f · ∂ U g U ⊆ x which has 2 | x | terms. More generally, k k � � � f i = ∂ x ∂ U i f i i =1 U 1 ,... U k i =1 where we sum over all partitions U 1 , . . . U k of x into k subsets. There are k | x | terms in this sum.

Mixed derivative over a separation Partition the functions of a CDN into M 1 and M 2 ◮ with variable sets C 1 and C 2 and S 1 , 2 = C 1 ∩ C 2 ◮ and G 1 and G 2 the products of functions in M 1 and M 2 . Then � � � � � ∂ x [ G 1 G 2 ] = ∂ x C 1 \ S 1 , 2 ∂ x A G 1 ∂ x C 2 \ S 1 , 2 ∂ x S 1 , 2 \ A G 2 A ⊆ S 1 , 2

Junction Tree: definition Let G = ( V , S , E ) be a CDN. A tree T = ( C , E ) is a junction tree for G if 1. C is a cover for V : each C j ∈ C is a subset of V and � j C j = V 2. family preservation holds: for each φ ∈ S , there is a C j ∈ C such that scope ( φ ) ⊆ C j 3. running intersection property holds: if C i ∈ C is on the path between C j and C k , then C j ∩ C k ⊆ C i

Junction Tree: example � $$$$$$$$$$$$ # � � � � � # # (b)

Construction of the junction tree In implementation ◮ greedily eliminate the variables with the minimal fill-in algorithm ◮ construct elimination subsets for nodes in the junction tree using the MATLAB Bayes Net Toolbox (Murphy, 2001)

Decomposition of the joint CDF Partitioning function of S into M j , the joint CDF is � � F ( x ) = ψ j ( x C j ) , where ψ j ≡ φ C j ∈C φ ∈ M j Let r be a chosen root of the joint tree. Then � T r F ( x ) = ψ r ( x C r ) k ( x ) k ∈E r where T r � k ( x ) = ψ j ( x C j ) j ∈ τ r k and τ r k is the subtree rooted at k .

Derivative of the joint CDF   � T r ∂ x F ( x ) = ∂ x  ψ r ( x C r ) k ( x )  k ∈E r   � T r = ∂ x Cr ∂ x Cr  ψ r ( x C r ) k ( x )  k ∈E r   � T r = ∂ x Cr  ψ r ( x C r ) ∂ x Cr k ( x )  k ∈E r   � k \ Cr T r = ∂ x Cr  ψ r ( x C r ) k ( x ) ∂ x τ r  k ∈E r the last equality follows from the running intersection property

Messages to the root of the junction tree Message from children k to root r , where A ⊆ C r � � k \ Cr T r m k → r ( A ) ≡ ∂ x A ∂ x τ r k ( x ) In particular k \ Cr T r m k → r ( ∅ ) = ∂ x τ r k ( x ) At the root, if U r ⊆ E r , and A ⊆ C r   � m r ( A , U r ) ≡ ∂ x A  ψ r ( x C r ) m k → r ( ∅ )  k ∈E r

Messages in the rest of the junction tree   � m i ( A , U i ) ≡ ∂ x A  ψ i ( x C i ) m j → i ( ∅ )  j ∈ U i where A ⊆ C i and U i ⊆ E i . � � j \ Si , j T i m j → i ( A ) ≡ ∂ x A ∂ x τ i j ( x ) where A ⊆ S i , j .

Messages in the rest of the junction tree In terms of messages   � m i ( A , U i ) = ∂ x A  ψ i ( x C i ) m k → i ( ∅ ) m j → i ( ∅ )  j ∈ U i \{ k } � = m k → i ( B ) m i ( A \ B , U i \ { k } ) B ⊆ A ∩ S i , k   T j � m j → i ( A ) = ∂ x A , Cj \ Si , j  ψ j ( x C j ) l ( x )  l ∈E j \{ i } = m j ( A ∪ ( C j \ S i , j ) , E j \ { i } )

Gradient of the likelihood Likelihood P ( x | θ ) = ∂ x [ F ( x | θ )] = m r ( C r , E r ) Gradient likelihood ∇ θ m r ( C r , E r ) decomposed similarly to m r ( C r , E r ) in the junction tree: ◮ g i ≡ ∇ θ m i ◮ g j → i ≡ ∇ θ m j → i

JDiff algorithm: outline for each cluster (from leaf to root): 1. compute derivative within cluster 2. compute messages from children 3. send messages to parent

Complexity of JDiff O-notation of number of steps/terms in each inner loop for fixed j : | C j | � | C j | � | M j | k = ( | M j | + 1) | C j | � 1. k k =1 | S j , k | � | S j , k | � 2 | C j \ S j , k | 2 l � 2. ( |E j | − 1) max l k ∈E j l =0 3. 2 | S j , k | Total. Exponential in tree-width of graph � � j ( | M j | + 1) | C j | + max ( j , k ) ∈E ( |E j | − 1)2 | C j \ S j , k | 3 | S j , k | O max

Application: symbolic differentiation on graphs Computation of ∂ x F ( x ) on CDNs ◮ Grids: 3 × 3 to 9 × 9 ◮ Cycles: 10 to 20 nodes =>&??( @#0;"/#0&-#( >A( @=>%#=%2% � % A=>%#=%2% � % ;-'+#% <%#=% � %>?%5',=% B17&"#% ?=C<%#=% � %>=CD%#=% <=>%#=% � %EC?%#=% @=F%#=% � %<>=F%#=%

Application: modeling heavy-tailed data ◮ Rainfall: 61 daily measurements of rainfall at 22 sites in China ◮ H1N1: 29 weekly mortality rates in 11 cities in the Northeastern US during the 2008-2009 epidemic ∩ 1(b). (c) (d)

Application: modeling heavy-tailed data Average test log-likelihoods on leave-one-out cross-validation % % % % % % % % G.',8.&&%+.$.% H<I<%5*-$.&'$1% % � � � � � �

Future work ◮ Develop compact models (bounded treewidth) for applications in other areas (seismology) ◮ Study connection between CDNs and other copula-based algorithms ◮ Develop faster approximate algorithms

Exact inference and learning for cumulative distribution functions - PowerPoint PPT Presentation

Exact inference and learning for cumulative distribution functions on loopy graphs Jim C. Huang, Nebojsa Jojic and Christopher Meek NIPS 2010 Presented by Jenny Lam Previous work Cumulative distribution networks and the derivative-sum-

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Evaluation of cumulative impact of Evaluation of cumulative impact of Evaluation of cumulative

OVERVIEW OF OVERVIEW OF CUMULATIVE EFFECTS CUMULATIVE EFFECTS ASSESSMENT ASSESSMENT What is

t= 1 train err= 7.9% test err= 17.8% 1 cumulative distribution 0.5 -1 -0.5 0 0.5 1

Outline Outline 2 Joint Cumulative Distribution Function (4.1, Joint Cumulative

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

CS 730/830: Intro AI Bayesian Networks Approx. Inference Exact Inference Wheeler Ruml (UNH)

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Exact Inference Inference Basic task for inference: Compute

Algebraic Tools for Exact Geometric Computing I - Exact Arithmetic and Filtering Michael Hemmer

Variable Elimination 1 Inference Exact inference Enumeration Variable elimination

Wanaque Reservoir TMDL and Wanaque Reservoir TMDL and Cumulative WLAs/LA for the Cumulative

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Exact Bayesian inference by symbolic disintegration Chung-chieh Shan Norman Ramsey Indiana

Notes on exact meets and joins R. N. Ball, J. Picado and A. Pultr 1 Exact meets and joins.

BEST OF EXACT GLOBE Jos Suijkens Michiel Beek Best of Exact Globe 2 Agenda 1. A fresh look for

Lieutenant General Tim Keating Chief of Defence Force 24 February, 2015 COALITION CONTRIBUTIONS

Sampling to Characterize Cantor Sets Sarah McCarty University of Nebraska at Omaha Allison

Characteristics from SEM Images for Inverse Prediction sion laboratory managed and operated by

Nonlinear Programming Formulation of Chance-Constraints Andreas Wchter joint with Alejandra

Programme (RWSEP) empowering rural people through the COMMUNITY DEVELOPMENT FUND Ato Zemene

Statement 2014 Disclaimer This presentation has been prepared by the management of Nyrstar NV

CRE AT E NYC UPDAT E F OR COMMUNIT Y BOARD 3 MANHAT T AN MAY 9, 2018 INT RO &

SBA & PPP Loan Forgiveness Considerations May 27, 2020 Presenters Jonathan Penick, CPA

Exact inference and learning for cumulative distribution functions - PowerPoint PPT Presentation

Exact inference and learning for cumulative distribution functions on loopy graphs Jim C. Huang, Nebojsa Jojic and Christopher Meek NIPS 2010 Presented by Jenny Lam Previous work Cumulative distribution networks and the derivative-sum-

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Evaluation of cumulative impact of Evaluation of cumulative impact of Evaluation of cumulative

OVERVIEW OF OVERVIEW OF CUMULATIVE EFFECTS CUMULATIVE EFFECTS ASSESSMENT ASSESSMENT What is

t= 1 train err= 7.9% test err= 17.8% 1 cumulative distribution 0.5 -1 -0.5 0 0.5 1

Outline Outline 2 Joint Cumulative Distribution Function (4.1, Joint Cumulative

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

CS 730/830: Intro AI Bayesian Networks Approx. Inference Exact Inference Wheeler Ruml (UNH)

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Exact Inference Inference Basic task for inference: Compute

Algebraic Tools for Exact Geometric Computing I - Exact Arithmetic and Filtering Michael Hemmer

Variable Elimination 1 Inference Exact inference Enumeration Variable elimination

Wanaque Reservoir TMDL and Wanaque Reservoir TMDL and Cumulative WLAs/LA for the Cumulative

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

Exact Bayesian inference by symbolic disintegration Chung-chieh Shan Norman Ramsey Indiana

Notes on exact meets and joins R. N. Ball, J. Picado and A. Pultr 1 Exact meets and joins.

BEST OF EXACT GLOBE Jos Suijkens Michiel Beek Best of Exact Globe 2 Agenda 1. A fresh look for

Lieutenant General Tim Keating Chief of Defence Force 24 February, 2015 COALITION CONTRIBUTIONS

Sampling to Characterize Cantor Sets Sarah McCarty University of Nebraska at Omaha Allison

Characteristics from SEM Images for Inverse Prediction sion laboratory managed and operated by

Nonlinear Programming Formulation of Chance-Constraints Andreas Wchter joint with Alejandra

Programme (RWSEP) empowering rural people through the COMMUNITY DEVELOPMENT FUND Ato Zemene

Statement 2014 Disclaimer This presentation has been prepared by the management of Nyrstar NV

CRE AT E NYC UPDAT E F OR COMMUNIT Y BOARD 3 MANHAT T AN MAY 9, 2018 INT RO &amp;

SBA &amp; PPP Loan Forgiveness Considerations May 27, 2020 Presenters Jonathan Penick, CPA

CRE AT E NYC UPDAT E F OR COMMUNIT Y BOARD 3 MANHAT T AN MAY 9, 2018 INT RO &

SBA & PPP Loan Forgiveness Considerations May 27, 2020 Presenters Jonathan Penick, CPA