Bayesian networks: basics Machine Intelligence Thomas D. Nielsen - PowerPoint PPT Presentation

Bayesian networks: basics Machine Intelligence Thomas D. Nielsen September 2008 Bayesian networks: basics September 2008 1 / 17

Basics Random/Chance Variables A name and a state space: Weather : { sunny, cloudy,rain } Blood Pressure: { high, normal, low } Grade: {− 3 , 00 , 02 , 4 , 7 , 10 , 12 } Annual income: { 1 DKK , 2 DKK , 3 DKK , 4 DKK , . . . } Weight: x ∈ R A probability distribution on the state space: sunny : 0.3, cloudy : 0.5, rain : 0.2 Occurrence of k events within a time interval: e − λ λ k / k ! (Poisson distribution) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 5 10 15 20 Continuous distribution: x ∼ N ( µ, σ ) (Gaussian distribution) Notation: sp ( A ) denotes the state space of random variable A . Bayesian networks: basics September 2008 2 / 17

Basics Joint Distribution Usually we are interested in the joint distribution of several variables, e.g. probability that Weather is sunny and Grade is 10. Notation: we also write sp ( A , B ) for the joint state space of two (or more) variables. Example: sp ( Weather , Grade ) = { ( sunny , 1 ) , ( sunny , 2 ) , . . . , ( rain , 13 ) } Bayesian networks: basics September 2008 3 / 17

Basics Joint Distribution Usually we are interested in the joint distribution of several variables, e.g. probability that Weather is sunny and Grade is 10. Notation: we also write sp ( A , B ) for the joint state space of two (or more) variables. Example: sp ( Weather , Grade ) = { ( sunny , 1 ) , ( sunny , 2 ) , . . . , ( rain , 13 ) } Conditional Probabilities A joint distribution defines conditional probabilities: P ( A = a | B = b ) := P ( A = a , B = b ) / P ( B = b ) This is also known as the fundamental rule (when read as a theorem, not a definition). Bayesian networks: basics September 2008 3 / 17

Basics Joint Distribution Usually we are interested in the joint distribution of several variables, e.g. probability that Weather is sunny and Grade is 10. Notation: we also write sp ( A , B ) for the joint state space of two (or more) variables. Example: sp ( Weather , Grade ) = { ( sunny , 1 ) , ( sunny , 2 ) , . . . , ( rain , 13 ) } Conditional Probabilities A joint distribution defines conditional probabilities: P ( A = a | B = b ) := P ( A = a , B = b ) / P ( B = b ) This is also known as the fundamental rule (when read as a theorem, not a definition). Bayes Rule From the definition of the conditional probability: P ( B = b | A = a ) = P ( A = a | B = b ) · P ( B = b ) / P ( A = a ) Bayesian networks: basics September 2008 3 / 17

Basics Generalization If an equality (like Baye’s rule) is true for all possible values a , b of the random variables A , B , one simply writes it in the form P ( B | A ) = P ( A | B ) P ( B ) / P ( A ) Bayesian networks: basics September 2008 4 / 17

Basics Generalization If an equality (like Baye’s rule) is true for all possible values a , b of the random variables A , B , one simply writes it in the form P ( B | A ) = P ( A | B ) P ( B ) / P ( A ) Conditioning on context A probabilistic law remains valid when all probabilities are conditioned on a common “context” variable C . E.g. Baye’s rule: P ( B | A , C ) = P ( A | B , C ) P ( B | C ) / P ( A | C ) Bayesian networks: basics September 2008 4 / 17

Basics Generalization If an equality (like Baye’s rule) is true for all possible values a , b of the random variables A , B , one simply writes it in the form P ( B | A ) = P ( A | B ) P ( B ) / P ( A ) Conditioning on context A probabilistic law remains valid when all probabilities are conditioned on a common “context” variable C . E.g. Baye’s rule: P ( B | A , C ) = P ( A | B , C ) P ( B | C ) / P ( A | C ) Chain rule For any set of random variables V 1 , V 2 , . . . , V n : P ( V 1 , . . . , V n ) = P ( V 1 , . . . , V n − 1 ) P ( V n | V 1 , . . . , V n − 1 ) = P ( V 1 , . . . , V n − 2 ) P ( V n − 1 | V 1 , . . . , V n − 2 ) P ( V n | V 1 , . . . , V n − 1 ) . . . = P ( V 1 ) P ( V 2 | V 1 ) · · · P ( V i | V 1 , . . . , V i − 1 ) · · · P ( V n | V 1 , . . . , V n − 1 ) Bayesian networks: basics September 2008 4 / 17

Basics Chain rule + Conditional Independence → Factorization Chain rule again: P ( V 1 , . . . , V n ) = P ( V 1 ) P ( V 2 | V 1 ) · · · P ( V i | V 1 , . . . , V i − 1 ) · · · P ( V n | V 1 , . . . , V n − 1 ) Now suppose that for each i : pa ( V i ) ⊆ { V 1 , . . . , V i − 1 } such that P ( V i | V 1 , . . . , V i − 1 ) = P ( V i | pa ( V i )) (i.e. V i is conditionally independent of { V 1 , . . . , V i − 1 } \ pa ( V i ) given pa ( V i ) ). This gives the factorization of P ( V 1 , . . . , V n ) : n P ( V 1 , . . . , V n ) = P ( V i | pa ( V i )) . Y i = 1 Bayesian networks: basics September 2008 6 / 17

Basics Factorization → Bayesian Networks B A C C E a 1 a 2 a 3 B c 1 c 2 E c 1 e 1 0 . 1 0 . 6 0 . 3 C B b 1 c 1 e 2 0 . 2 0 . 8 0 . 5 0 . 5 0 . 0 b 1 b 2 b 3 b 2 c 2 e 1 0 . 5 0 . 5 0 . 4 0 . 2 0 . 4 b 3 c 2 e 2 0 . 3 0 . 3 0 . 4 0 . 6 0 . 4 0 . 1 0 . 1 0 . 8 . . . A D A Bayesian network for the (discrete) random variables V = V 1 , . . . , V n is defined by a directed acyclic graph ( V , → ) for each V i a conditional probability table P ( V i | pa ( V i )) specifying the conditional distribution of V i given its parents in the graph. The Bayesian network defines a joint distribution of V as: n P ( V 1 , . . . , V n ) = P ( V i | pa ( V i )) Y i = 1 Bayesian networks: basics September 2008 7 / 17

Basics Elementary Conditional Independence Property V i : node in Bayesian network desc ( V i ) : descendants of V i rest ( V i ) : nondescendants without parents and V i pa ( V i ) rest ( V i ) V i desc ( V i ) P ( V i | pa ( V i ) , rest ( V i )) = P ( V i | pa ( V i )) “ V i is independent of its nondescendants, given its parents” Bayesian networks: basics September 2008 8 / 17

Basics The d-Separation Relation ( V , → ) a directed acyclic graph, A , B , C ⊆ V disjoint subsets of nodes. C d-separates A from B if the following holds: every undirected path that connects a node A ∈ A with a node B ∈ B satisfies at least one of the following two conditions: 1. the path contains a node C ∈ C , and the edges that connect C are serial ( . . . → C → . . . ) or divergent ( . . . ← C → . . . ). 2. the path contains a node U , the edges that connect U are convergent ( . . . → U ← . . . ), and ( U ∪ desc ( U )) ∩ C = ∅ . A B B A B C C Serial A C Diverging D Converging Bayesian networks: basics September 2008 9 / 17

Basics pa ( A ) d-separates A from rest ( A ) : C C B A U Bayesian networks: basics September 2008 10 / 17

Basics d-Separation Theorem ( V , → , { P ( V i | pa ( V i )) | i = 1 , . . . , n } ) a Bayesian network that defines joint distribution P . Then for all pairwise disjoint A , B , C ⊆ V : If C d-separates A from B in ( V , → ) , then P ( A | B , C ) = P ( A | C ) . [Elementary Conditional Independence Property is a special case] Proof can be found in Verma & Pearl (1990) Bayesian networks: basics September 2008 11 / 17

Basics Basic Inference Problems Given a Bayesian network ( V , → , { P ( V i | pa ( V i )) | i = 1 , . . . , n } ) . (a) Computation of a-posteriori distributions: Given E 1 , . . . , E k ∈ V , e i ∈ sp ( E i ) . Wanted: For all A ∈ V \ E : the conditional distribution of A given (“the evidence”) E = e : P ( A | E 1 = e 1 , . . . , E k = e k ) Bayesian networks: basics September 2008 12 / 17

Basics Basic Inference Problems Given a Bayesian network ( V , → , { P ( V i | pa ( V i )) | i = 1 , . . . , n } ) . (a) Computation of a-posteriori distributions: Given E 1 , . . . , E k ∈ V , e i ∈ sp ( E i ) . Wanted: For all A ∈ V \ E : the conditional distribution of A given (“the evidence”) E = e : P ( A | E 1 = e 1 , . . . , E k = e k ) (b) Computation of most likely configurations ( most probable explanations (MPE) ): Evidence E = e as in (a) . A := V \ E . Wanted: a max ∈ sp ( A ) with P ( A = a max | E = e ) = arg max a ∈ sp ( A ) P ( A = a | E = e ) . Bayesian networks: basics September 2008 12 / 17

Bayesian networks: basics Machine Intelligence Thomas D. Nielsen - PowerPoint PPT Presentation

Bayesian networks: basics Machine Intelligence Thomas D. Nielsen September 2008 Bayesian networks: basics September 2008 1 / 17 Basics Random/Chance Variables A name and a state space: Weather : { sunny, cloudy,rain } Blood Pressure: {

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed Basics of

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12

Bayesian Networks Philipp Koehn 29 October 2015 Philipp Koehn Artificial Intelligence: Bayesian

Decision Trees Gavin Brown Every Learning Method has Limitations Linear model? KNN ? SVM ?

WORKING FAITH studies from the book of JAMES JoLynn Gower 493-6151

BLOCKING SETS OF HALL PLANES, AND VALUE SETS OF POLYNOMIALS OVER FINITE FIELDS Fq13, Gaeta June

Parametric Tiling with Inter-Tile Data Reuse Alexandre Isoard Alain Darte Compsys, LIP

Instability of extreme black holes James Lucietti University of Edinburgh EMPG seminar, 31 Oct

Enhancement of near-cloaking using multilayer structures Mikyoung LIM (KAIST) June 23, 2012

Advance Caching 1 Way-associative cache blocks sharing the block/line address same index

Detecting new supernova remnants with GLEAM Natasha Hurley-Walker Curtin University

Sambuz

Useful Links

Newsletter

Mail Us