15-780 Graduate Artificial Intelligence: Probabilistic inference - PowerPoint PPT Presentation

15-780 – Graduate Artificial Intelligence: Probabilistic inference J. Zico Kolter (this lecture) and Nihar Shah Carnegie Mellon University Spring 2020 1

Outline Probabilistic graphical models Probabilistic inference Exact inference Sample-based inference A brief look at deep generative models 2

Probabilistic graphical models Probabilistic graphical models are all about representing distributions 𝑞 𝑌 where 𝑌 represents some large set of random variables 0,1 푛 ( 𝑜 -dimensional random variable), would take Example: suppose 𝑌 ∈ 2 푛 − 1 parameters to describe the full joint distribution Graphical models offer a way to represent these same distributions more compactly, by exploiting conditional independencies in the distribution Note: I’m going to use “probabilistic graphical model” and “Bayesian network” interchangeably, even though there are differences 4

Bayesian networks A Bayesian network is defined by 1. A directed acyclic graph, 𝐻 = {𝑊 = 𝑌 1 , … , 𝑌 푛 , 𝐹} 2. A set of conditional distributions 𝑞 𝑌 푖 Parents 𝑌 푖 Defines the joint probability distribution 푛 𝑞 𝑌 = ∏ 𝑞 𝑌 푖 Parents 𝑌 푖 푖=1 Equivalently: each node is conditionally independent of all non-descendants given its parents 5

Example Bayesian network X 1 X 2 X 3 X 4 Conditional independencies let us simply the joint distribution: 𝑞 𝑌 1 , 𝑌 2 , 𝑌 3 , 𝑌 4 = 𝑞 𝑌 1 𝑞 𝑌 2 𝑌 1 𝑞 𝑌 3 𝑌 1 , 𝑌 2 𝑞 𝑌 4 𝑌 1 , 𝑌 2 , 𝑌 3 2 4 − 1 = 15 = 𝑞 𝑌 1 𝑞 𝑌 2 𝑌 1 )𝑞 𝑌 3 𝑌 2 𝑞 𝑌 4 𝑌 3 parameters (assuming binary variables) 6

Example Bayesian network X 1 X 2 X 3 X 4 Conditional independencies let us simply the joint distribution: 𝑞 𝑌 1 , 𝑌 2 , 𝑌 3 , 𝑌 4 = 𝑞 𝑌 1 𝑞 𝑌 2 𝑌 1 𝑞 𝑌 3 𝑌 1 , 𝑌 2 𝑞 𝑌 4 𝑌 1 , 𝑌 2 , 𝑌 3 2 4 − 1 = 15 = 𝑞 𝑌 1 𝑞 𝑌 2 𝑌 1 )𝑞 𝑌 3 𝑌 2 𝑞 𝑌 4 𝑌 3 parameters (assuming binary variables) 1 parameter 2 parameters 7

Example Bayesian network X 1 X 2 X 3 X 4 Conditional independencies let us simply the joint distribution: 𝑞 𝑌 1 , 𝑌 2 , 𝑌 3 , 𝑌 4 = 𝑞 𝑌 1 𝑞 𝑌 2 𝑌 1 𝑞 𝑌 3 𝑌 1 , 𝑌 2 𝑞 𝑌 4 𝑌 1 , 𝑌 2 , 𝑌 3 2 4 − 1 = 15 = 𝑞 𝑌 1 𝑞 𝑌 2 𝑌 1 )𝑞 𝑌 3 𝑌 2 𝑞 𝑌 4 𝑌 3 parameters (assuming binary 7 parameters variables) 8

Poll: Simple Bayesian network What conditional independencies exist in the following Bayesian network? X 1 X 2 1. 𝑌 1 and 𝑌 2 are marginally independent 2. 𝑌 4 is conditionally independent of 𝑌 1 given 𝑌 3 X 3 3. 𝑌 1 is conditionally independent of 𝑌 4 given 𝑌 3 X 4 4. 𝑌 1 is conditionally independent of 𝑌 2 given 𝑌 3 9

Generative model Can also describe the probabilistic distribution as a sequential “story”, this is called a generative model 𝑌 1 ∼ Bernoulli 𝜚 1 2 𝑌 2 | 𝑌 1 = 𝑦 1 ∼ Bernoulli 𝜚 푥 1 X 1 X 2 X 3 X 4 3 𝑌 3 | 𝑌 2 = 𝑦 2 ∼ Bernoulli 𝜚 푥 2 3 𝑌 4 | 𝑌 3 = 𝑦 3 ∼ Bernoulli 𝜚 푥 3 “First sample 𝑌 1 from a Bernoulli distribution with parameter 𝜚 1 , then sample 𝑌 2 from a 2 , where 𝑦 1 is the value we sampled for 𝑌 1 , then Bernoulli distribution with parameter 𝜚 푥 1 sample 𝑌 3 from a Bernoulli …” 10

More general generative models This notion of a “sequential story” (generative model) is extremely powerful for describing very general distributions Naive Bayes: 𝑍 ∼ Bernoulli 𝜚 푖 𝑌 푖 |𝑍 = 𝑧 ∼ Categorical 𝜚 푦 Gaussian mixture model: 𝑎 ∼ Categorical 𝜚 𝑌|𝑎 = 𝑨 ∼ 𝒪 𝜈 푧 , Σ 푧 11

More general generative models Linear regression: 𝑍 |𝑌 = 𝑦 ∼ 𝒪 𝜄 𝑈 𝑦, 𝜏 2 Changepoint model: 𝑌 ∼ Uniform 0,1 𝑍 |𝑌 = 𝑦 ∼ {𝒪 𝜈 1 , 𝜏 2 if 𝑦 < 𝑢 𝒪 𝜈 2 , 𝜏 2 if 𝑦 ≥ 𝑢 Latent Dirichlet Allocation: 𝑁 documents, 𝐿 topics, 𝑂 𝑗 words/document 𝜄 𝑗 ∼ Dirichlet 𝛽 (topic distributions per document) 𝜚 𝑙 ∼ Dirichlet 𝛾 (word distributions per topic) 𝑨 𝑗,𝑘 ∼ Categorical 𝜄 𝑗 (topic of 𝑘th word in document 𝑗) 𝑥 𝑗,𝑘 ∼ Categorical 𝜚 𝑨 𝑗 ,𝑘 (𝑘th word in document 𝑗) 12

The inference problem Given observations (i.e., knowing the value of some of the variables in a model), what is the distribution over the other (hidden) variables? A relatively “easy” problem if we observe variables at the “beginning” of chains in a Bayesian network: • If we observe the value of 𝑌 1 , then 𝑌 2 , 𝑌 3 , 𝑌 4 have the same distribution as before, just with 𝑌 1 “fixed” X 1 X 2 X 3 X 4 • But if we observe 𝑌 4 what is the distribution over 𝑌 1 , 𝑌 2 , 𝑌 3 ? X 1 X 2 X 3 X 4 14

Many types of inference problems Marginal inference: given a generative distribution for 𝑞 X over 𝑌 = {𝑌 1 , … , 𝑌 푛 } , determine 𝑞(𝑌 ℐ ) for ℐ ⊆ {1, … , 𝑜} MAP inference: determine assignment with the maximum probability Conditional variants: solve either of the two variants conditioned on some observable variables, e.g. 𝑞(𝑌 ℐ |𝑌 ℰ = 𝑦 ℰ ) 15

Approaches to inference There are three categories of common approaches to inference (more exist, but these are most common) 1. Exact methods: Bayes’ rule or variable elimination methods 2. Sampling approaches: draw samples from the the distribution over hidden variables, without construction them explicitly 3. Approximate variational approaches: approximate distributions over hidden variables using “simple” distributions, minimizing the difference between these distributions and the true distributions 16

Exact inference example Mixture of Gaussians model: 𝑎 ∼ Categorical 𝜚 𝑌|𝑎 = 𝑨 ∼ 𝒪 𝜈 푧 , Σ 푧 Task: compute 𝑞(𝑎|𝑦) Z X In this case, we can solve inference exactly with Bayes’ rule: 𝑞 𝑦 𝑎 𝑞 𝑎 𝑞 𝑎 𝑦 = ∑ 푧 𝑞 𝑦 𝑨 𝑞 𝑨 18

Exact inference in graphical models In some cases, it’s possible to exploit the structure of the graphical model to develop efficient exact inference methods Example: how can I compute 𝑞(𝑌 4 ) ? X 1 X 2 X 3 X 4 𝑞 𝑌 4 = ∑ 𝑄 𝑦 1 𝑄 𝑦 2 𝑦 1 𝑄 𝑦 3 𝑦 2 𝑄 𝑌 4 𝑦 3 푥 1 ,푥 2 ,푥 3 19

Need for approximate inference In most cases, the exact distribution over hidden variables cannot be computed, would require representing an exponentially large distribution over hidden variables (or infinite, in continuous case) 𝑎 푖 ∼ Bernoulli 𝜚 푖 , 𝑗 = 1, … , 𝑜 𝑌|𝑎 = 𝑨 ∼ 𝒪 𝜄 푇 𝑨, 𝜏 2 Z 1 Z 2 Z n · · · X Distribution 𝑄 (𝑎|𝑦) is a full distribution over 𝑜 binary random variables 20

Sample-based inference If we can draw samples from a posterior distribution, then we can approximate arbitrary probabilistic queries about that distribution A naive strategy (rejection sampling): draw samples from the generative model until we find one that matches the observed data, distribution over other variables will be samples of the hidden variables given observed variables As we get more complex models, and more observed variables, probability that we see our exact observations goes to zero X 1 X 2 X 3 X 4 22

Markov Chain Monte Carlo Let’s consider a generic technique for generating samples from a distribution 𝑞 𝑌 (suppose distribution is complex so that we cannot directly compute or sample) Our strategy is going to be to generate samples 𝑌 푡 via some conditional distribution 𝑞(𝑌 푡+1 |𝑌 푡 ) , constructed to guarantee that 𝑞 𝑌 푡 → 𝑞(𝑌) 23

̃ ̃ ̃ ̃ ̃ Metropolis-Hastings Algorithm One of the workhorses of modern probabilistic methods 1. Pick some 𝑦 0 (e.g., completely randomly) 2. For 𝑢 = 1,2, … Sample: 𝑦 푡+1 ∼ 𝑟 𝑌 ′ 𝑌 = 𝑦 푡 Set: 𝑦 푡+1 𝑟 𝑦 푡 𝑦 푡+1 1, 𝑞 𝑦 푡+1 𝑥. 𝑞. min 𝑦 푡+1 ≔ 𝑦 푡+1 𝑦 푡 𝑞 𝑦 푡 𝑟 𝑦 푡 otherwise 24

Notes on MH We choose 𝑟(𝑌 ′ |𝑌) so that we can easily sample from the distribution (e.g., for continuous distributions, it’s common to choose) 𝑟 𝑌 ′ 𝑌 = 𝑦 = 𝒪 𝑦 ′ 𝑦; 𝐽 Note that even if we cannot compute the probabilities 𝑞(𝑦 푡 ) and 𝑞( ̃ 𝑦 푡+1 ) we can 𝑦 푡+1 )/𝑞(𝑦 푡 ) (requires only being able to compute the often compute their ratio 𝑞( ̃ unnormalized probabilities), e.g., consider the case X 1 X 2 X 3 X 4 25

15-780 Graduate Artificial Intelligence: Probabilistic inference - PowerPoint PPT Presentation

15-780 Graduate Artificial Intelligence: Probabilistic inference J. Zico Kolter (this lecture) and Nihar Shah Carnegie Mellon University Spring 2020 1 Outline Probabilistic graphical models Probabilistic inference Exact inference

15-780 Graduate Artificial Intelligence: Probabilistic modeling J. Zico Kolter (this lecture)

0.9 0.8 0.7 0.6 0.5 0.4 R 23 = C 2 is misclassified as C 3 0.3 R 32 = C 3 is misclassified as

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

Artificial Intelligence Probabilistic Reasoning (Probably the last part -- 4) CS 444 Spring

What is Artificial Intelligence? . . . Exactly what the computer provides is the ability not to be

Probabilistic representation, representation of uncertainty Applied artificial intelligence

CSE 473: Artificial Intelligence Spring 2014 Uncertainty & Probabilistic Reasoning Hanna

Artificial Intelligence Probabilistic Reasoning CS 444 Spring 2019 Dr. Kevin Molloy

CSE 473: Artificial Intelligence Spring 2014 Uncertainty & Probabilistic Reasoning Hanna

Probabilistic representation and reasoning Applied artificial intelligence (EDAF70) Lecture 04

Artificial Intelligence Probabilistic Reasoning (Part 3) CS 444 Spring 2019 Dr. Kevin

AI Artificial Intelligence Definition artificial intelligence / rd

CSC421 Intro to Artificial Intelligence UNIT 22: Probabilistic Reasoning Midterm Review

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

1.1 What is AI? 1. What is Artificial Intelligence? 2. AI Past and Present 3. Rational

Democratizing Machine Learning and Artificial Intelligence: Probabilistic Programming with Scala

15-780 Graduate Artificial Intelligence: Adversarial attacks and provable defenses J. Zico

Probabilistic Models CS 4100: Artificial Intelligence Bayes Nets Models describe how (a

Probabilistic representation Applied artificial intelligence (EDA132) Lecture 10

Probabilistic Models CS 4100: Artificial Intelligence Bayes Nets Models describe how (a

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

CSCI 446: Artificial Intelligence CSCI 446: Artificial Intelligence Course Website:

Introduction to Artificial Intelligence What is Artificial Intelligence for YOU? CPSC 533

15-780 Graduate Artificial Intelligence: Probabilistic inference - PowerPoint PPT Presentation

15-780 Graduate Artificial Intelligence: Probabilistic inference J. Zico Kolter (this lecture) and Nihar Shah Carnegie Mellon University Spring 2020 1 Outline Probabilistic graphical models Probabilistic inference Exact inference

15-780 Graduate Artificial Intelligence: Probabilistic modeling J. Zico Kolter (this lecture)

0.9 0.8 0.7 0.6 0.5 0.4 R 23 = C 2 is misclassified as C 3 0.3 R 32 = C 3 is misclassified as

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

Artificial Intelligence Probabilistic Reasoning (Probably the last part -- 4) CS 444 Spring

What is Artificial Intelligence? . . . Exactly what the computer provides is the ability not to be

Probabilistic representation, representation of uncertainty Applied artificial intelligence

CSE 473: Artificial Intelligence Spring 2014 Uncertainty &amp; Probabilistic Reasoning Hanna

Artificial Intelligence Probabilistic Reasoning CS 444 Spring 2019 Dr. Kevin Molloy

CSE 473: Artificial Intelligence Spring 2014 Uncertainty &amp; Probabilistic Reasoning Hanna

Probabilistic representation and reasoning Applied artificial intelligence (EDAF70) Lecture 04

Artificial Intelligence Probabilistic Reasoning (Part 3) CS 444 Spring 2019 Dr. Kevin

AI Artificial Intelligence Definition artificial intelligence / rd

CSC421 Intro to Artificial Intelligence UNIT 22: Probabilistic Reasoning Midterm Review

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

1.1 What is AI? 1. What is Artificial Intelligence? 2. AI Past and Present 3. Rational

Democratizing Machine Learning and Artificial Intelligence: Probabilistic Programming with Scala

15-780 Graduate Artificial Intelligence: Adversarial attacks and provable defenses J. Zico

Probabilistic Models CS 4100: Artificial Intelligence Bayes Nets Models describe how (a

Probabilistic representation Applied artificial intelligence (EDA132) Lecture 10

Probabilistic Models CS 4100: Artificial Intelligence Bayes Nets Models describe how (a

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

CSCI 446: Artificial Intelligence CSCI 446: Artificial Intelligence Course Website:

Introduction to Artificial Intelligence What is Artificial Intelligence for YOU? CPSC 533

CSE 473: Artificial Intelligence Spring 2014 Uncertainty & Probabilistic Reasoning Hanna

CSE 473: Artificial Intelligence Spring 2014 Uncertainty & Probabilistic Reasoning Hanna