Bayesian networks Lecture 24 David Sontag New York - PowerPoint PPT Presentation

Bayesian ¡networks ¡ Lecture ¡24 ¡ David ¡Sontag ¡ New ¡York ¡University ¡

Hidden ¡Markov ¡models ¡ We ¡can ¡represent ¡a ¡hidden ¡Markov ¡model ¡with ¡a ¡graph: ¡ • X 1 ¡ X 2 ¡ X 3 ¡ X 4 ¡ X 5 ¡ X 6 ¡ Shading ¡in ¡denotes ¡ observed ¡variables ¡ Y 1 ¡ Y 2 ¡ Y 3 ¡ Y 4 ¡ Y 5 ¡ Y 6 ¡ n Y Pr( x 1 , . . . x n , y 1 , . . . , y n ) = Pr( x 1 ) Pr( y 1 | x 1 ) Pr( x t | x t − 1 ) Pr( y t | x t ) t =2 There ¡is ¡a ¡1-‑1 ¡mapping ¡between ¡the ¡graph ¡structure ¡and ¡the ¡factorizaDon ¡ • of ¡the ¡joint ¡distribuDon ¡

Bayesian ¡networks ¡ • A ¡ Bayesian ¡network ¡is ¡specified ¡by ¡a ¡directed ¡ acyclic ¡graph ¡ G=(V,E) ¡with: ¡ – One ¡node ¡ i ¡for ¡each ¡random ¡variable ¡ X i ¡ – One ¡condiDonal ¡probability ¡distribuDon ¡(CPD) ¡per ¡node, ¡ p ( x i ¡| ¡ x Pa(i) ), ¡ specifying ¡the ¡variable’s ¡probability ¡condiDoned ¡on ¡its ¡parents’ ¡values ¡ • Corresponds ¡1-‑1 ¡with ¡a ¡parDcular ¡factorizaDon ¡of ¡the ¡joint ¡ distribuDon: ¡ Y p ( x 1 , . . . x n ) = p ( x i | x Pa ( i ) ) i ∈ V

Example ¡ • Consider ¡the ¡following ¡Bayesian ¡network: ¡ d 0 d 1 i 0 i 1 0.6 0.4 0.7 0.3 Difficulty Intelligence g 1 g 2 g 3 Grade SAT i 0 , d 0 0.3 0.4 0.3 i 0 , d 1 0.05 0.25 0.7 s 0 s 1 i 0 , d 0 0.9 0.08 0.02 Letter i 0 , d 1 i 0 0.5 0.3 0.2 0.95 0.05 i 1 0.2 0.8 l 0 l 1 g 1 0.1 0.9 g 2 0.4 0.6 g 2 0.99 0.01 • What ¡is ¡its ¡joint ¡distribuDon? ¡ Y p ( x 1 , . . . x n ) = p ( x i | x Pa ( i ) ) i ∈ V p ( d , i , g , s , l ) = p ( d ) p ( i ) p ( g | i , d ) p ( s | i ) p ( l | g )

More ¡examples ¡ Y p ( x 1 , . . . x n ) = p ( x i | x Pa ( i ) ) i ∈ V Will ¡my ¡car ¡start ¡this ¡morning? ¡ Heckerman ¡ et ¡al. , ¡Decision-‑TheoreDc ¡TroubleshooDng, ¡1995 ¡

More ¡examples ¡ Y p ( x 1 , . . . x n ) = p ( x i | x Pa ( i ) ) i ∈ V What ¡is ¡the ¡differenDal ¡diagnosis? ¡ Beinlich ¡ et ¡al. , ¡The ¡ALARM ¡Monitoring ¡System, ¡1989 ¡

CondiDonal ¡independencies ¡ The ¡network ¡structure ¡implies ¡ d 0 d 1 i 0 i 1 several ¡condiDonal ¡independence ¡ 0.7 0.3 0.6 0.4 statements: ¡ Difficulty Intelligence g 1 g 2 g 3 D ⊥ I Grade SAT i 0 , d 0 0.3 0.4 0.3 i 0 , d 1 0.05 0.25 0.7 G ⊥ S | I s 0 s 1 i 0 , d 0 0.9 0.08 0.02 Letter i 0 , d 1 i 0 0.5 0.3 0.2 0.95 0.05 i 1 0.2 0.8 D ⊥ L | G l 0 l 1 g 1 0.1 0.9 g 2 0.4 0.6 L ⊥ S | G g 2 0.99 0.01 L ⊥ S | I If ¡two ¡variables ¡are ¡(condiDonally) ¡independent, ¡ ¡ structure ¡has ¡no ¡edge ¡between ¡them ¡ D ⊥ S

Inference ¡in ¡Bayesian ¡networks ¡ • CompuDng ¡marginal ¡probabiliDes ¡in ¡ tree ¡ structured ¡Bayesian ¡ networks ¡is ¡easy ¡ – The ¡algorithm ¡called ¡“belief ¡propagaDon” ¡generalizes ¡what ¡we ¡showed ¡for ¡ hidden ¡Markov ¡models ¡to ¡arbitrary ¡trees ¡ • Wait… ¡this ¡isn’t ¡a ¡tree! ¡What ¡can ¡we ¡do? ¡

Inference ¡in ¡Bayesian ¡networks ¡ • In ¡some ¡cases ¡(such ¡as ¡this) ¡we ¡can ¡ transform ¡this ¡into ¡what ¡is ¡ called ¡a ¡“juncDon ¡tree”, ¡and ¡then ¡run ¡belief ¡propagaDon ¡

Approximate ¡inference ¡ • There ¡is ¡also ¡a ¡wealth ¡of ¡ approximate ¡inference ¡algorithms ¡that ¡can ¡ be ¡applied ¡to ¡Bayesian ¡networks ¡such ¡as ¡these ¡ • Markov ¡chain ¡Monte ¡Carlo ¡algorithms ¡repeatedly ¡sample ¡ assignments ¡for ¡esDmaDng ¡marginals ¡ • VariaDonal ¡inference ¡algorithms ¡(which ¡are ¡determinisDc) ¡agempt ¡ to ¡fit ¡a ¡simpler ¡distribuDon ¡to ¡the ¡complex ¡distribuDon, ¡and ¡then ¡ computes ¡marginals ¡for ¡the ¡simpler ¡distribuDon ¡

Dimensionality ¡reducDon ¡of ¡text ¡data ¡ • The ¡problem ¡with ¡using ¡a ¡bag ¡of ¡words ¡representaDon: ¡ auto car make engine emissions hidden bonnet hood Markov tyres make model lorry model emissions boot trunk normalize Synonymy Polysemy Large distance, but Small distance, but not related related [Example ¡from ¡Lillian ¡Lee] ¡

ProbabilisDc ¡Topic ¡Models ¡ • A ¡probabilisDc ¡version ¡of ¡SVD ¡(called ¡LSA ¡when ¡ applied ¡to ¡text ¡data) ¡ • Originated ¡in ¡domain ¡of ¡staDsDcs ¡& ¡machine ¡learning ¡ – (e.g., ¡Hoffman, ¡2001; ¡Blei, ¡Ng, ¡Jordan, ¡2003) ¡ • Extracts ¡topics ¡from ¡large ¡collecDons ¡of ¡text ¡ • Topics ¡are ¡interpretable ¡unlike ¡the ¡arbitrary ¡ dimensions ¡of ¡LSA ¡ ¡

Model ¡is ¡GeneraDve ¡ Find parameters that “reconstruct” data DATA Topic Model Corpus of text: Word counts for each document

Document ¡generaDon ¡as ¡ ¡ a ¡probabilisDc ¡process ¡ for ¡each ¡document, ¡choose ¡ 1. a ¡mixture ¡of ¡topics ¡ ¡ For ¡every ¡word ¡slot, ¡ ¡ 2. sample ¡a ¡topic ¡[1..T] ¡ ¡ from ¡the ¡mixture ¡ sample ¡a ¡word ¡from ¡the ¡topic ¡ 3.

Example ¡ l o a bank ¡ n DOCUMENT 1: money 1 bank 1 bank 1 loan 1 river 2 stream 2 bank 1 ¡ .8 ¡ money 1 river 2 bank 1 money 1 bank 1 loan 1 money 1 stream 2 bank 1 loan ¡ bank ¡ money 1 bank 1 bank 1 loan 1 river 2 stream 2 bank 1 money 1 river 2 bank 1 loan ¡ .2 ¡ money 1 bank 1 loan 1 bank 1 money 1 stream 2 TOPIC ¡1 ¡ .3 ¡ DOCUMENT 2: river 2 stream 2 bank 2 stream 2 bank 2 money 1 loan 1 river 2 stream 2 loan 1 bank 2 river 2 bank 2 bank 1 stream 2 river 2 loan 1 r i bank 2 stream 2 bank 2 money 1 loan 1 river 2 stream 2 bank 2 stream 2 .7 ¡ v e stream ¡ r bank 2 money 1 river 2 stream 2 loan 1 bank 2 river 2 bank 2 money 1 ¡ bank 1 stream 2 river 2 bank 2 stream 2 bank 2 money 1 TOPIC ¡2 ¡ Bayesian approach: use priors Mixture Mixture Mixture weights ~ Dirichlet( α ) components weights Mixture components ~ Dirichlet( β )

Latent ¡Dirichlet ¡allocaDon ¡ Topic proportions and Topics Documents assignments gene 0.04 β 1 dna 0.02 genetic 0.01 z 1 d .,, θ d life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... z Nd data 0.02 number 0.02 β T computer 0.01 .,, (Blei, ¡Ng, ¡Jordan ¡JMLR ¡‘03) ¡

Latent ¡Dirichlet ¡allocaDon ¡ Topic ¡word ¡distribuDons ¡ Topic-‑word ¡ Dirichlet ¡prior ¡ distribuDons ¡ pna ¡.0100 ¡ sore ¡throat ¡ ¡ ¡.05 ¡ cough ¡.0095 ¡ swallow ¡.0092 ¡ pneumonia ¡.0090 ¡ voice ¡.0080 ¡ cxr ¡.0085 ¡ fevers ¡.0075 ¡ levaquin ¡.0060 ¡ ear ¡.0016 ¡ … ¡ … ¡ β 1 β 2 celluliDs ¡.0105 ¡ swelling ¡.0100 ¡ redness ¡.0055 ¡ Graphical ¡model ¡for ¡Latent ¡Dirichlet ¡AllocaDon ¡(LDA) ¡ lle ¡.0050 ¡ … ¡ fevers ¡.0045 ¡ β T θ d Pneumonia ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡0.50 ¡ Inference ¡ Common ¡cold ¡0.49 ¡ Diabetes ¡ ¡0.01 ¡ ¡ ¡ Low ¡Dimensional ¡representaDon: ¡ distribuDon ¡of ¡topics ¡for ¡the ¡note ¡ Triage ¡note ¡ (Blei, ¡Ng, ¡Jordan ¡JMLR ¡‘03) ¡

InverDng ¡the ¡model ¡(learning) ¡ ? ¡ DOCUMENT 1: money ? bank ? bank ? loan ? river ? stream ? bank ? money ? river ? bank ? money ? bank ? loan ? money ? stream ? bank ? money ? bank ? bank ? loan ? river ? stream ? bank ? money ? river ? bank ? money ? bank ? loan ? bank ? money ? stream ? ? ¡ TOPIC ¡1 ¡ DOCUMENT 2: river ? stream ? bank ? stream ? bank ? money ? loan ? river ? stream ? loan ? bank ? river ? bank ? bank ? stream ? river ? loan ? bank ? stream ? bank ? money ? loan ? river ? stream ? bank ? stream ? ? ¡ bank ? money ? river ? stream ? loan ? bank ? river ? bank ? money ? bank ? stream ? river ? bank ? stream ? bank ? money ? TOPIC ¡2 ¡ Mixture Mixture components weights

Example ¡of ¡learned ¡representaDon ¡ Paraphrased ¡note: ¡ ¡“Pa;ent ¡has ¡URI ¡[upper ¡respiratory ¡infec4on] ¡ symptoms ¡ like ¡cough, ¡runny ¡nose, ¡ear ¡pain . ¡ Denies ¡ fevers . ¡ history ¡of ¡seasonal ¡allergies ” ¡ Inferred ¡Topic ¡ DistribuEon ¡ Allergy ¡ Cold/URI ¡ Allergy ¡ Cold ¡ Other ¡

Bayesian networks Lecture 24 David Sontag New York - PowerPoint PPT Presentation

Bayesian networks Lecture 24 David Sontag New York University Hidden Markov models We can represent a hidden Markov model with a graph: X 1 X 2

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Networks Volker Sorge Intro to AI: Specifying Probability Distributions Lecture 8

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12

Detecting cold streams with absorption line systems Michele Fumagalli Inter[stellar and

Sentiment Analysis What is Sentiment Analysis? Dan Jurafsky Positive or negative movie review?

Victory Garden 101 Plan Apr. 7: Preparing Your Garden Site & Soil Apr. 14

IS THERE A MAXIMUM MASS FOR STAR FORMATION? Picture Credit: NASA, ESA, and The Hubble Heritage

RSA and Factorization Sourav Sen Gupta Indian Statistical Institute, Kolkata About this talk

Challenges and R&D for DAQ in Particle Physics Experiment Kai Chen With input from many

Spatial and Temporal Knowledge Representation Antony Galton University of Exeter, UK PART II:

TPC electronics calibration with pulser in cold box data BNL DUNE David Adams BNL June 20,

Bayesian networks Lecture 24 David Sontag New York - PowerPoint PPT Presentation

Bayesian networks Lecture 24 David Sontag New York University Hidden Markov models We can represent a hidden Markov model with a graph: X 1 X 2

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Networks Volker Sorge Intro to AI: Specifying Probability Distributions Lecture 8

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12

Detecting cold streams with absorption line systems Michele Fumagalli Inter[stellar and

Sentiment Analysis What is Sentiment Analysis? Dan Jurafsky Positive or negative movie review?

Victory Garden 101 Plan Apr. 7: Preparing Your Garden Site &amp; Soil Apr. 14

IS THERE A MAXIMUM MASS FOR STAR FORMATION? Picture Credit: NASA, ESA, and The Hubble Heritage

RSA and Factorization Sourav Sen Gupta Indian Statistical Institute, Kolkata About this talk

Challenges and R&amp;D for DAQ in Particle Physics Experiment Kai Chen With input from many

Spatial and Temporal Knowledge Representation Antony Galton University of Exeter, UK PART II:

TPC electronics calibration with pulser in cold box data BNL DUNE David Adams BNL June 20,

Victory Garden 101 Plan Apr. 7: Preparing Your Garden Site & Soil Apr. 14

Challenges and R&D for DAQ in Particle Physics Experiment Kai Chen With input from many