scoring bayesian networks of mixed variables

Scoring Bayesian Networks of Mixed Variables Bryan Andrews, MS - PowerPoint PPT Presentation

Scoring Bayesian Networks of Mixed Variables Bryan Andrews, MS Joesph Ramsey, PhD and Greg Cooper, MD, PhD August 14, 2017 Learning Bayesian Networks (BNs) BNs constitute a widely used graphical framework for representing probabilistic

  1. Scoring Bayesian Networks of Mixed Variables Bryan Andrews, MS Joesph Ramsey, PhD and Greg Cooper, MD, PhD August 14, 2017

  2. Learning Bayesian Networks (BNs) ● BNs constitute a widely used graphical framework for representing probabilistic relationships ● Many application in Bayesian Inference and Causal Discovery ● Learning structure is crucial – Limited work has been done in the presence of both discrete and continuous variables 2

  3. Learning Bayesian Networks (BNs) ● BNs constitute a widely used graphical framework for representing probabilistic relationships ● Many application in Bayesian Inference and Causal Discovery ● Learning structure is crucial – Limited work has been done in the presence of both discrete and continuous variables Goal: Provide scalable solutions for learning BNs in the presence of both discrete and continuous variables 3

  4. Outline ● Bayesian Information Criterion (BIC) ● Mixed Variable Polynomial (MVP) score ● Conditional Gaussian (CG) score ● Adaptations ● Simulations and empirical results 4

  5. Outline ● Bayesian Information Criterion (BIC) ● Mixed Variable Polynomial (MVP) score ● Conditional Gaussian (CG) score ● Adaptations ● Simulations and empirical results 5

  6. The Bayesian Information Criterion Let M be a model and D be a dataset BIC is an approximation for log p ( M|D ) 6

  7. The Bayesian Information Criterion Let M be a model and D be a dataset BIC is an approximation for log p ( M|D ) log p ( M ∣ D )≈− 2 lik + dof log n Where lik is the log likelihood, dof are the degrees of freedom, and n is the number of samples 7

  8. The Bayesian Information Criterion Let M be a model and D be a dataset BIC is an approximation for log p ( M|D ) log p ( M ∣ D )≈− 2 lik + dof log n Where lik is the log likelihood, dof are the degrees of freedom, and n is the number of samples Scores a BN as the sum over all BIC calculations for each node given its parents 8

  9. Outline ● Bayesian Information Criterion (BIC) ● Mixed Variable Polynomial (MVP) score ● Conditional Gaussian (CG) score ● Adaptations ● Simulations and empirical results 9

  10. The Mixed Variable Polynomial (MVP) score ● Use higher order polynomials to estimate relationships between variables – Allows for nonlinear relationships between continuous variables – Allows for complicated PMFs for discrete variables Approximates Logistic Regression 10

  11. The Mixed Variable Polynomial (MVP) score ● Use higher order polynomials to estimate relationships between variables – Allows for nonlinear relationships between continuous variables – Allows for complicated PMFs for discrete variables Approximates Logistic Regression ● Calculate a log-likelihood and degrees of freedom for BIC 11

  12. Modeling a Continuous Child ● Partition according to the discrete parents – Splits the data into subsets 12

  13. Modeling a Continuous Child ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Calculate a log likelihood and degrees of freedom for each subset 13

  14. Modeling a Continuous Child ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Calculate a log likelihood and degrees of freedom for each subset ● Aggregate the log likelihood and degrees of freedom terms from each subset together 14

  15. Modeling a Continuous Child ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Calculate a log likelihood and degrees of freedom for each subset ● Aggregate the log likelihood and degrees of freedom terms from each subset together ● Score continuous child using BIC 15

  16. Modeling a Continuous Child ● Let X, Y be continuous Y A ● Let A be discrete (|A| = 3) ● Want: lik X | Y, A , dof X | Y, A X 16

  17. 17

  18. 18

  19. lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 19

  20. lik X | Y, A = lik 1 + lik 2 + lik 3 dof X | Y, A = dof 1 + dof 2 + dof 3 lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 20

  21. lik X | Y, A = lik 1 + lik 2 + lik 3 dof X | Y, A = dof 1 + dof 2 + dof 3 lik 1 -2lik X | Y, A + dof X | Y, A log n dof 1 lik 2 lik 3 dof 2 dof 3 21

  22. Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| 22

  23. Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets 23

  24. Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Treat the regression lines a components to PMFs for A 24

  25. Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Treat the regression lines a components to PMFs for A ● Calculate a log likelihood and degrees of freedom for each subset 25

  26. Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Treat the regression lines a components to PMFs for A ● Calculate a log likelihood and degrees of freedom for each subset ● Aggregate the log likelihood and degrees of freedom terms from each subset together 26

  27. Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Treat the regression lines a components to PMFs for A ● Calculate a log likelihood and degrees of freedom for each subset ● Aggregate the log likelihood and degrees of freedom terms from each subset together ● Score discrete child using BIC 27

  28. Modeling a Discrete Child ● Let X be continuous ● Let A be discrete (|A| = 3) X A ● Want: lik A | X , dof A | X 28

  29. 1 3 2 29

  30. ∑ p ( A = a ∣ X = x )= 1 ∀ x a ∈{ 0,1,2 } 1 2 p ( A = a ∣ X = x )≥ 0 ∀ a, x 3 30

  31. ∑ p ( A = a ∣ X = x )= 1 ∀ x a ∈{ 0,1,2 } 1 2 – True for the proposed method p ( A = a ∣ X = x )≥ 0 ∀ a, x 3 31

  32. ∑ p ( A = a ∣ X = x )= 1 ∀ x a ∈{ 0,1,2 } 1 2 – True for the proposed method p ( A = a ∣ X = x )≥ 0 ∀ a, x – True in the sample limit given some assumptions 3 32

  33. ∑ p ( A = a ∣ X = x )= 1 ∀ x a ∈{ 0,1,2 } 1 2 – True for the proposed method p ( A = a ∣ X = x )≥ 0 ∀ a, x – True in the sample limit given some assumptions 3 Define a procedure to shrink illegal distributions back into the domain of probabilities 33

  34. 1 3 2 34

  35. 1 3 2 35

  36. 1 2 lik A | X 3 dof A | X 36

  37. 1 2 lik A | X 3 dof A | X -2lik A | X + dof A | X log n 37

  38. Outline ● Bayesian Information Criterion (BIC) ● Mixed Variable Polynomial (MVP) score ● Conditional Gaussian (CG) score ● Adaptations ● Simulations and empirical results 38

  39. The Conditional Gaussian (CG) score ● Move all the continuous variables to the left and all the discrete variables to the right of the conditioning bar – Calculate the desired probability using partitioned Gaussian and Multinomial distributions 39

  40. The Conditional Gaussian (CG) score ● Move all the continuous variables to the left and all the discrete variables to the right of the conditioning bar – Calculate the desired probability using partitioned Gaussian and Multinomial distributions ● Calculate a log-likelihood and degrees of freedom for BIC 40

  41. Modeling a Continuous Child Assume Y, A are Let X, Y be continuous parents of X Let A be discrete Y A X 41

  42. Modeling a Continuous Child Assume Y, A are Let X, Y be continuous parents of X Let A be discrete p ( X ∣ Y , A )= p ( X ,Y , A ) Y A p ( Y , A ) X 42

  43. Modeling a Continuous Child Assume Y, A are Let X, Y be continuous parents of X Let A be discrete p ( X ∣ Y , A )= p ( X ,Y , A ) Y A p ( Y , A ) 1 = p ( X ,Y ∣ A ) p ( A ) p ( Y ∣ A ) p ( A ) X 1 = p ( X ,Y ∣ A ) p ( Y ∣ A ) 43

  44. Modeling a Continuous Child Assume Y, A are Let X, Y be continuous parents of X Let A be discrete p ( X ∣ Y , A )= p ( X ,Y , A ) Y A p ( Y , A ) 1 = p ( X ,Y ∣ A ) p ( A ) p ( Y ∣ A ) p ( A ) X 1 = p ( X ,Y ∣ A ) Partitioned p ( Y ∣ A ) Gaussians 44

  45. Modeling a Continuous Child ● Want: lik X, Y | A , dof X, Y | A p ( X ,Y ∣ A ) lik Y | A , dof Y | A p ( Y ∣ A ) Y A X 45

  46. lik X, Y | A , dof X, Y | A 46

  47. lik X, Y | A , dof X, Y | A 47

  48. lik X, Y | A , dof X, Y | A lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 48

  49. lik X, Y | A , dof X, Y | A lik X, Y | A = lik 1 + lik 2 + lik 3 dof X, Y | A = dof 1 + dof 2 + dof 3 lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 49

  50. lik Y | A , dof Y | A 50

  51. lik Y | A , dof Y | A 51

  52. lik Y | A , dof Y | A lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 52


More recommend