Scoring Bayesian Networks of Mixed Variables Bryan Andrews, MS Joesph Ramsey, PhD and Greg Cooper, MD, PhD August 14, 2017
Learning Bayesian Networks (BNs) ● BNs constitute a widely used graphical framework for representing probabilistic relationships ● Many application in Bayesian Inference and Causal Discovery ● Learning structure is crucial – Limited work has been done in the presence of both discrete and continuous variables 2
Learning Bayesian Networks (BNs) ● BNs constitute a widely used graphical framework for representing probabilistic relationships ● Many application in Bayesian Inference and Causal Discovery ● Learning structure is crucial – Limited work has been done in the presence of both discrete and continuous variables Goal: Provide scalable solutions for learning BNs in the presence of both discrete and continuous variables 3
Outline ● Bayesian Information Criterion (BIC) ● Mixed Variable Polynomial (MVP) score ● Conditional Gaussian (CG) score ● Adaptations ● Simulations and empirical results 4
Outline ● Bayesian Information Criterion (BIC) ● Mixed Variable Polynomial (MVP) score ● Conditional Gaussian (CG) score ● Adaptations ● Simulations and empirical results 5
The Bayesian Information Criterion Let M be a model and D be a dataset BIC is an approximation for log p ( M|D ) 6
The Bayesian Information Criterion Let M be a model and D be a dataset BIC is an approximation for log p ( M|D ) log p ( M ∣ D )≈− 2 lik + dof log n Where lik is the log likelihood, dof are the degrees of freedom, and n is the number of samples 7
The Bayesian Information Criterion Let M be a model and D be a dataset BIC is an approximation for log p ( M|D ) log p ( M ∣ D )≈− 2 lik + dof log n Where lik is the log likelihood, dof are the degrees of freedom, and n is the number of samples Scores a BN as the sum over all BIC calculations for each node given its parents 8
Outline ● Bayesian Information Criterion (BIC) ● Mixed Variable Polynomial (MVP) score ● Conditional Gaussian (CG) score ● Adaptations ● Simulations and empirical results 9
The Mixed Variable Polynomial (MVP) score ● Use higher order polynomials to estimate relationships between variables – Allows for nonlinear relationships between continuous variables – Allows for complicated PMFs for discrete variables Approximates Logistic Regression 10
The Mixed Variable Polynomial (MVP) score ● Use higher order polynomials to estimate relationships between variables – Allows for nonlinear relationships between continuous variables – Allows for complicated PMFs for discrete variables Approximates Logistic Regression ● Calculate a log-likelihood and degrees of freedom for BIC 11
Modeling a Continuous Child ● Partition according to the discrete parents – Splits the data into subsets 12
Modeling a Continuous Child ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Calculate a log likelihood and degrees of freedom for each subset 13
Modeling a Continuous Child ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Calculate a log likelihood and degrees of freedom for each subset ● Aggregate the log likelihood and degrees of freedom terms from each subset together 14
Modeling a Continuous Child ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Calculate a log likelihood and degrees of freedom for each subset ● Aggregate the log likelihood and degrees of freedom terms from each subset together ● Score continuous child using BIC 15
Modeling a Continuous Child ● Let X, Y be continuous Y A ● Let A be discrete (|A| = 3) ● Want: lik X | Y, A , dof X | Y, A X 16
17
18
lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 19
lik X | Y, A = lik 1 + lik 2 + lik 3 dof X | Y, A = dof 1 + dof 2 + dof 3 lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 20
lik X | Y, A = lik 1 + lik 2 + lik 3 dof X | Y, A = dof 1 + dof 2 + dof 3 lik 1 -2lik X | Y, A + dof X | Y, A log n dof 1 lik 2 lik 3 dof 2 dof 3 21
Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| 22
Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets 23
Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Treat the regression lines a components to PMFs for A 24
Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Treat the regression lines a components to PMFs for A ● Calculate a log likelihood and degrees of freedom for each subset 25
Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Treat the regression lines a components to PMFs for A ● Calculate a log likelihood and degrees of freedom for each subset ● Aggregate the log likelihood and degrees of freedom terms from each subset together 26
Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Treat the regression lines a components to PMFs for A ● Calculate a log likelihood and degrees of freedom for each subset ● Aggregate the log likelihood and degrees of freedom terms from each subset together ● Score discrete child using BIC 27
Modeling a Discrete Child ● Let X be continuous ● Let A be discrete (|A| = 3) X A ● Want: lik A | X , dof A | X 28
1 3 2 29
∑ p ( A = a ∣ X = x )= 1 ∀ x a ∈{ 0,1,2 } 1 2 p ( A = a ∣ X = x )≥ 0 ∀ a, x 3 30
∑ p ( A = a ∣ X = x )= 1 ∀ x a ∈{ 0,1,2 } 1 2 – True for the proposed method p ( A = a ∣ X = x )≥ 0 ∀ a, x 3 31
∑ p ( A = a ∣ X = x )= 1 ∀ x a ∈{ 0,1,2 } 1 2 – True for the proposed method p ( A = a ∣ X = x )≥ 0 ∀ a, x – True in the sample limit given some assumptions 3 32
∑ p ( A = a ∣ X = x )= 1 ∀ x a ∈{ 0,1,2 } 1 2 – True for the proposed method p ( A = a ∣ X = x )≥ 0 ∀ a, x – True in the sample limit given some assumptions 3 Define a procedure to shrink illegal distributions back into the domain of probabilities 33
1 3 2 34
1 3 2 35
1 2 lik A | X 3 dof A | X 36
1 2 lik A | X 3 dof A | X -2lik A | X + dof A | X log n 37
Outline ● Bayesian Information Criterion (BIC) ● Mixed Variable Polynomial (MVP) score ● Conditional Gaussian (CG) score ● Adaptations ● Simulations and empirical results 38
The Conditional Gaussian (CG) score ● Move all the continuous variables to the left and all the discrete variables to the right of the conditioning bar – Calculate the desired probability using partitioned Gaussian and Multinomial distributions 39
The Conditional Gaussian (CG) score ● Move all the continuous variables to the left and all the discrete variables to the right of the conditioning bar – Calculate the desired probability using partitioned Gaussian and Multinomial distributions ● Calculate a log-likelihood and degrees of freedom for BIC 40
Modeling a Continuous Child Assume Y, A are Let X, Y be continuous parents of X Let A be discrete Y A X 41
Modeling a Continuous Child Assume Y, A are Let X, Y be continuous parents of X Let A be discrete p ( X ∣ Y , A )= p ( X ,Y , A ) Y A p ( Y , A ) X 42
Modeling a Continuous Child Assume Y, A are Let X, Y be continuous parents of X Let A be discrete p ( X ∣ Y , A )= p ( X ,Y , A ) Y A p ( Y , A ) 1 = p ( X ,Y ∣ A ) p ( A ) p ( Y ∣ A ) p ( A ) X 1 = p ( X ,Y ∣ A ) p ( Y ∣ A ) 43
Modeling a Continuous Child Assume Y, A are Let X, Y be continuous parents of X Let A be discrete p ( X ∣ Y , A )= p ( X ,Y , A ) Y A p ( Y , A ) 1 = p ( X ,Y ∣ A ) p ( A ) p ( Y ∣ A ) p ( A ) X 1 = p ( X ,Y ∣ A ) Partitioned p ( Y ∣ A ) Gaussians 44
Modeling a Continuous Child ● Want: lik X, Y | A , dof X, Y | A p ( X ,Y ∣ A ) lik Y | A , dof Y | A p ( Y ∣ A ) Y A X 45
lik X, Y | A , dof X, Y | A 46
lik X, Y | A , dof X, Y | A 47
lik X, Y | A , dof X, Y | A lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 48
lik X, Y | A , dof X, Y | A lik X, Y | A = lik 1 + lik 2 + lik 3 dof X, Y | A = dof 1 + dof 2 + dof 3 lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 49
lik Y | A , dof Y | A 50
lik Y | A , dof Y | A 51
lik Y | A , dof Y | A lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 52
Recommend
More recommend