CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Bayesian networks Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics Significant parts of this material come from the lectures on Bayesian networks which are part of Artificial Intelligence course by Pieter Abbeel and Dan Klein. The original lectures can be found at http://ai.berkeley.edu P. Pošík c � 2017 Artificial Intelligence – 1 / 36
Introduction P. Pošík c � 2017 Artificial Intelligence – 2 / 36
Uncertainty Probabilistic reasoning is one of the frameworks that allow us to maintain our beliefs and knowledge in uncertain environments. Introduction • Uncertainty • Notation • Cheatsheet • Joint distribution • Contents Bayesian networks Inference Summary P. Pošík c � 2017 Artificial Intelligence – 3 / 36
Uncertainty Probabilistic reasoning is one of the frameworks that allow us to maintain our beliefs and knowledge in uncertain environments. Introduction Usual scenario: • Uncertainty • Notation ■ Observed variables (evidence): known things related to the state of the world; often • Cheatsheet imprecise, noisy (info from sensors, symptoms of a patient, etc.). • Joint distribution • Contents ■ Unobserved, hidden variables: unknown, but important aspects of the world; we Bayesian networks need to reason about them (what the position of an object is, whether a disease is Inference present, etc.) Summary ■ Model: describes the relations among hidden and observed variables; allows us to reason. P. Pošík c � 2017 Artificial Intelligence – 3 / 36
Uncertainty Probabilistic reasoning is one of the frameworks that allow us to maintain our beliefs and knowledge in uncertain environments. Introduction Usual scenario: • Uncertainty • Notation ■ Observed variables (evidence): known things related to the state of the world; often • Cheatsheet imprecise, noisy (info from sensors, symptoms of a patient, etc.). • Joint distribution • Contents ■ Unobserved, hidden variables: unknown, but important aspects of the world; we Bayesian networks need to reason about them (what the position of an object is, whether a disease is Inference present, etc.) Summary ■ Model: describes the relations among hidden and observed variables; allows us to reason. Models (including probabilistic) ■ describe how (a part of) the world works. ■ are always approximations or simplifications: ■ They cannot acount for everything (they would be as complex as the world itself). ■ They represent only a chosen subset of variables and interactions between them. ■ “All models are wrong; some are useful.” — George E. P. Box P. Pošík c � 2017 Artificial Intelligence – 3 / 36
Uncertainty Probabilistic reasoning is one of the frameworks that allow us to maintain our beliefs and knowledge in uncertain environments. Introduction Usual scenario: • Uncertainty • Notation ■ Observed variables (evidence): known things related to the state of the world; often • Cheatsheet imprecise, noisy (info from sensors, symptoms of a patient, etc.). • Joint distribution • Contents ■ Unobserved, hidden variables: unknown, but important aspects of the world; we Bayesian networks need to reason about them (what the position of an object is, whether a disease is Inference present, etc.) Summary ■ Model: describes the relations among hidden and observed variables; allows us to reason. Models (including probabilistic) ■ describe how (a part of) the world works. ■ are always approximations or simplifications: ■ They cannot acount for everything (they would be as complex as the world itself). ■ They represent only a chosen subset of variables and interactions between them. ■ “All models are wrong; some are useful.” — George E. P. Box A probabilistic model is a joint distribution over a set of random variables. P. Pošík c � 2017 Artificial Intelligence – 3 / 36
Notation Random variables (start with capital letters): X , Y , Weather , . . . Introduction • Uncertainty Values of random variables (start with lower-case letters): • Notation • Cheatsheet x 1 , e i , rainy , . . . • Joint distribution • Contents Probability distribution of a random variable: Bayesian networks Inference P ( X ) or P X Summary Probability of a random event: P ( X = x 1 ) or P X ( x 1 ) Shorthand for a probability of a random event (if there is no chance of confusion): P (+ r ) meaning P ( Rainy = true ) or P ( r ) meaning P ( Weather = rainy ) P. Pošík c � 2017 Artificial Intelligence – 4 / 36
Probability cheatsheet Conditional probability: P ( X | Y ) = P ( X , Y ) Introduction P ( Y ) • Uncertainty • Notation Product rule: • Cheatsheet • Joint distribution • Contents P ( X , Y ) = P ( X | Y ) P ( Y ) Bayesian networks Bayes rule: Inference Summary P ( x | y ) = P ( y | x ) P ( x ) P ( y | x ) P ( x ) = P ( y ) ∑ i P ( y | x i ) P ( x i ) Chain rule: n ∏ P ( X 1 , X 2 , . . . , X n ) = P ( X 1 ) P ( X 2 | X 1 ) P ( X 3 | X 1 , X 2 ) · . . . = P ( X i | X 1 , . . . , X i − 1 ) i = 1 X ⊥ ⊥ Y ( X and Y are independent ) iff ∀ x , y : P ( x , y ) = P ( x ) P ( y ) X ⊥ ⊥ Y | Z ( X and Y are conditinally independent given Z ) iff ∀ x , y , z : P ( x , y | z ) = P ( x | z ) P ( y | z ) P. Pošík c � 2017 Artificial Intelligence – 5 / 36
Joint probability distribution Joint distribution over a set of variables X 1 , . . . , X n (here descrete) assigns a probability to each combination of values: Introduction P ( X 1 = x 1 , . . . , X n = x n ) = P ( x 1 , . . . , x n ) • Uncertainty • Notation For a proper probability distribution: • Cheatsheet • Joint distribution • Contents ∑ ∀ x 1 , . . . , x n : P ( x 1 , . . . , x n ) ≥ 0 P ( x 1 , . . . , x n ) = 1 and Bayesian networks x 1 ,..., x n Inference Summary P. Pošík c � 2017 Artificial Intelligence – 6 / 36
Joint probability distribution Joint distribution over a set of variables X 1 , . . . , X n (here descrete) assigns a probability to each combination of values: Introduction P ( X 1 = x 1 , . . . , X n = x n ) = P ( x 1 , . . . , x n ) • Uncertainty • Notation For a proper probability distribution: • Cheatsheet • Joint distribution • Contents ∑ ∀ x 1 , . . . , x n : P ( x 1 , . . . , x n ) ≥ 0 P ( x 1 , . . . , x n ) = 1 and Bayesian networks x 1 ,..., x n Inference Probabilistic inference Summary ■ Compute a desired probability from other known probabilities (e.g. marginal or conditional from joint). ■ Conditional probabilities turn out to be the most interesting ones: ■ They represent our or agent’s beliefs given the evidence (measured values of observable variables). P ( bus on time | rush our ) = 0.8 ■ ■ Probabilities change with new evidence: P ( bus on time ) = 0.95 ■ P ( bus on time | rush our ) = 0.8 ■ P ( bus on time | rush our, dry roads ) = 0.85 ■ P. Pošík c � 2017 Artificial Intelligence – 6 / 36
Contents ■ What is a Bayesian network? ■ How it encodes the joint probability distributions? Introduction ■ What independence assumptions does it encode? • Uncertainty ■ How to perform reasoning using BN? • Notation • Cheatsheet • Joint distribution • Contents Bayesian networks Inference Summary P. Pošík c � 2017 Artificial Intelligence – 7 / 36
Bayesian networks P. Pošík c � 2017 Artificial Intelligence – 8 / 36
What’s wrong with the joint distribution? How many free parameters n params has a probability distribution over n variables, each having at least d possible values? ■ For all variables binary ( d = 2): Introduction Bayesian networks • Issues • BN • BN example • Independence • Independence? • Conditional independence • Causality • Assumptions in BN • Independence in BN • Causal chain • Common cause • Common effect • D-separation • D-sep examples Inference Summary P. Pošík c � 2017 Artificial Intelligence – 9 / 36
What’s wrong with the joint distribution? How many free parameters n params has a probability distribution over n variables, each having at least d possible values? ■ For all variables binary ( d = 2): n params = 2 n − 1 Introduction Bayesian networks ■ In general: • Issues • BN • BN example • Independence • Independence? • Conditional independence • Causality • Assumptions in BN • Independence in BN • Causal chain • Common cause • Common effect • D-separation • D-sep examples Inference Summary P. Pošík c � 2017 Artificial Intelligence – 9 / 36
What’s wrong with the joint distribution? How many free parameters n params has a probability distribution over n variables, each having at least d possible values? ■ For all variables binary ( d = 2): n params = 2 n − 1 Introduction ■ In general: n params ≥ d n − 1 Bayesian networks • Issues • BN • BN example • Independence • Independence? • Conditional independence • Causality • Assumptions in BN • Independence in BN • Causal chain • Common cause • Common effect • D-separation • D-sep examples Inference Summary P. Pošík c � 2017 Artificial Intelligence – 9 / 36
Recommend
More recommend