Bayesian Networks Alan Ri2er Problem: Non-IID Data Most - PowerPoint PPT Presentation

Bayesian ¡Networks ¡ Alan ¡Ri2er ¡

Problem: ¡Non-‑IID ¡Data ¡ • Most ¡real-‑world ¡data ¡is ¡not ¡IID ¡ – (like ¡coin ¡flips) ¡ • MulBple ¡correlated ¡variables ¡ • Examples: ¡ – Pixels ¡in ¡an ¡image ¡ – Words ¡in ¡a ¡document ¡ – Genes ¡in ¡a ¡microarray ¡ • We ¡saw ¡one ¡example ¡of ¡how ¡to ¡deal ¡with ¡this ¡ – Markov ¡Models ¡+ ¡Hidden ¡Markov ¡Models ¡

QuesBons ¡ • How ¡to ¡compactly ¡represent ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡? ¡ P ( X | θ ) • How ¡can ¡we ¡use ¡this ¡distribuBon ¡to ¡infer ¡one ¡ set ¡of ¡variables ¡given ¡another? ¡ • How ¡can ¡we ¡learn ¡the ¡parameters ¡with ¡a ¡ reasonable ¡amount ¡of ¡data? ¡

The ¡Chain ¡Rule ¡of ¡Probability ¡ P ( x 1: N ) = P ( x 1 ) P ( x 2 | x 1 ) P ( x 3 | x 1 , x 2 ) P ( x 4 | x 1 , x 2 , x 3 ) . . . P ( x N | x 1: N − 1 ) Problem: ¡this ¡distribuBon ¡has ¡2^(N-‑1) ¡parameters ¡ • Can ¡represent ¡any ¡joint ¡distribuBon ¡this ¡way ¡ • Using ¡any ¡ordering ¡of ¡the ¡variables… ¡

CondiBonal ¡Independence ¡ • This ¡is ¡the ¡key ¡to ¡represenBng ¡large ¡joint ¡ distribuBons ¡ • X ¡and ¡Y ¡are ¡condiBonally ¡independent ¡given ¡Z ¡ – if ¡and ¡only ¡if ¡the ¡condiBonal ¡joint ¡can ¡be ¡wri2en ¡ as ¡a ¡product ¡of ¡the ¡condiBonal ¡marginals ¡ X ⊥ Y | Z ⇐ ⇒ P ( X, Y | Z ) = P ( X | Z ) P ( Y | Z )

Graphical ¡Models ¡ • First ¡order ¡Markov ¡assumpBon ¡is ¡useful ¡for ¡1d ¡ sequence ¡data ¡ – Sequences ¡of ¡words ¡in ¡a ¡sentence ¡or ¡document ¡ • Q: ¡What ¡about ¡2d ¡images, ¡3d ¡video ¡ – Or ¡in ¡general ¡arbitrary ¡collecBons ¡of ¡variables ¡ • Gene ¡pathways, ¡etc… ¡

Graphical ¡Models ¡ • A ¡way ¡to ¡represent ¡a ¡joint ¡ 1 distribuBon ¡by ¡making ¡ 2 3 condiBonal ¡independence ¡ assumpBons ¡ 4 5 • Nodes ¡represent ¡variables ¡ Doesn’t ¡sound ¡ • (lack ¡of) ¡edges ¡represent ¡ as ¡cool ¡ condiBonal ¡independence ¡ 1 assumpBons ¡ 2 3 • Be2er ¡name: ¡“condiBonal ¡ independence ¡diagrams” ¡ 4 5

Graph ¡Terminology ¡ • Graph ¡(V,E) ¡consists ¡of ¡ ¡ – A ¡set ¡of ¡nodes ¡or ¡verBcies ¡V={1..V} ¡ – A ¡set ¡of ¡edges ¡{(s,t) ¡in ¡V} ¡ • Child ¡(for ¡directed ¡graph) ¡ • Ancestors ¡(for ¡directed ¡graph) ¡ • Decedents ¡(for ¡directed ¡graph) ¡ • Neighbors ¡(for ¡any ¡graph) ¡ • Cycle ¡(Directed ¡vs. ¡undirected) ¡ • Tree ¡(no ¡cycles) ¡ • Clique ¡/ ¡Maximal ¡Clique ¡

Directed ¡Graphical ¡Models ¡ • Graphical ¡Model ¡whose ¡graph ¡is ¡a ¡DAG ¡ – Directed ¡acyclic ¡graph ¡ – No ¡cycles! ¡ • A.K.A. ¡Bayesian ¡Networks ¡ – Nothing ¡inherently ¡Bayesian ¡about ¡them ¡ • Just ¡a ¡way ¡of ¡defining ¡condiBonal ¡independences ¡ • Just ¡sounds ¡cooler ¡I ¡guess… ¡

Directed ¡Graphical ¡Models ¡ • Key ¡property: ¡Nodes ¡can ¡be ¡ordered ¡so ¡that ¡ parents ¡come ¡before ¡children ¡ – Topological ¡ordering ¡ – Can ¡be ¡constructed ¡from ¡any ¡DAG ¡ • Ordered ¡Markov ¡Property: ¡ – GeneralizaBon ¡of ¡first-‑order ¡Markov ¡Property ¡to ¡ general ¡DAGs ¡ – Node ¡only ¡depends ¡on ¡it’s ¡parents ¡(not ¡other ¡ predecessors) ¡ x s ⊥ x pred( s ) − parents( s ) | x parents(s)

Example ¡ P ( x 1:5 ) = P ( x 1 ) P ( x 2 | x 1 ) P ( x 3 | x 1 , x 2 ) P ( x 4 | x 1 , x 2 , x 3 ) p ( x 5 | x 1 , x 2 , x 3 , x 4 ) = P ( x 1 ) P ( x 2 | x 1 ) P ( x 3 | x 1 ) P ( x 4 | x 2 , x 3 ) p ( x 5 | x 3 ) 1 2 3 4 5

Naïve ¡Bayes ¡ (Same ¡as ¡Gaussian ¡Mixture ¡Model ¡w/ ¡ Diagonal ¡Covariance) ¡ Y X 1 X 2 X 3 X 4 D Y P ( y, x 1: D ) = P ( y ) P ( x j | y ) j =1

Markov ¡Models ¡ First ¡order ¡Markov ¡Model ¡ Second ¡order ¡Markov ¡Model ¡ · · · · · · x 1 x 2 x 3 x 1 x 2 x 3 x 4 n n Y Y P ( x 1: N ) = P ( x 1 ) P ( x i | x i − 1 ) P ( x 1: N ) = P ( x 1 , x 2 ) P ( x i | x i − 1 , x i − 2 ) i =2 i =3 Hidden ¡Markov ¡Model ¡ z 1 z 2 z T x 1 x 2 x T n Y P ( x 1: N ) = P ( z 1 ) P ( x 1 | z 1 ) P ( z i | z i − 1 ) P ( x i | z i ) i =2

Example: ¡medical ¡Diagnosis ¡ The ¡Alarm ¡Network ¡ MinVolset Disconnect VentMach Intubation VentTube Kinked Pulm Tube Embolus PAP Shunt Press VentLung FIO2 Hypo Anaphy MinVol VentAlv Volemia Laxis Stroke PVSAT Insuff Volume Artco2 Anesth SAO2 TPR LvFailure Catechol CO ExpCo2 History Errlow Lved HR ErrCauter Output Volume CVP HRBP HRSAT BP HrEKG PCWP

Another ¡medical ¡diagnosis ¡example: ¡ QMR ¡network ¡ h 1 h 2 h 3 Diseases ¡ v 1 v 2 v 3 v 4 v 5 Symptoms ¡

Compact conditional distributions contd. Noisy-OR distributions model multiple noninteracting causes 1) Parents U 1 . . . U k include all causes (can add leak node) 2) Independent failure probability q i for each cause alone ⇒ P ( X | U 1 . . . U j , ¬ U j +1 . . . ¬ U k ) = 1 − Π j i = 1 q i Malaria P ( Fever ) P ( ¬ Fever ) Cold Flu F F F 1 . 0 0.0 F F T 0 . 9 0.1 F T F 0 . 8 0.2 F T T 0 . 98 0 . 02 = 0 . 2 × 0 . 1 T F F 0 . 4 0.6 T F T 0 . 94 0 . 06 = 0 . 6 × 0 . 1 T T F 0 . 88 0 . 12 = 0 . 6 × 0 . 2 T T T 0 . 988 0 . 012 = 0 . 6 × 0 . 2 × 0 . 1 Number of parameters linear in number of parents 24

ProbabilisBc ¡Inference ¡ • Graphical ¡Models ¡provide ¡a ¡compact ¡way ¡to ¡ represent ¡complex ¡joint ¡distribuBons ¡ • Q: ¡Given ¡a ¡joint ¡distribuBon, ¡what ¡can ¡we ¡do ¡ with ¡it? ¡ • A: ¡Main ¡use ¡= ¡ProbabilisBc ¡Inference ¡ – EsBmate ¡unknown ¡variables ¡from ¡known ¡ones ¡

Examples ¡of ¡Inference ¡ • Predict ¡the ¡most ¡likely ¡cluster ¡for ¡X ¡in ¡R^n ¡ given ¡a ¡set ¡of ¡mixture ¡components ¡ – This ¡is ¡what ¡you ¡did ¡in ¡HW ¡#1 ¡ • Viterbi ¡Algorithm, ¡Forward/Backward ¡(HMMs) ¡ – EsBmate ¡words ¡from ¡speech ¡signal ¡ – EsBmate ¡parts ¡of ¡speech ¡given ¡sequence ¡of ¡words ¡ in ¡a ¡text ¡

General ¡Form ¡of ¡Inference ¡ • We ¡have: ¡ – A ¡correlated ¡set ¡of ¡random ¡variables ¡ – Joint ¡distribuBon: ¡ ¡ P ( x 1: V | θ ) • AssumpBon: ¡parameters ¡are ¡known ¡ • ParBBon ¡variables ¡into: ¡ – Visible: ¡ x v – Hidden: ¡ x h • Goal: ¡compute ¡unknowns ¡from ¡knowns ¡ P ( x h | x v , θ ) = P ( x h , x v | θ ) P ( x h , x v | θ ) = P ( x v | θ ) P h P ( x 0 h , x v | θ ) x 0

Nuisance ¡Variables ¡ • ParBBon ¡hidden ¡variables ¡into: ¡ – Query ¡Variables: ¡ ¡ x q – Nuisance ¡variables: ¡ ¡ x u X P ( x q | x v , θ ) = P ( x q , x u | x v ) x u

Inference ¡vs. ¡Learning ¡ • Inference: ¡ – Compute ¡ P ( x h | x v , θ ) – Parameters ¡are ¡assumed ¡to ¡be ¡known ¡ • Learning ¡ – Compute ¡MAP ¡esBmate ¡of ¡the ¡parameters ¡ N ˆ X θ = arg max log P ( x i,v | θ ) + log P ( θ ) θ i =1

Bayesian ¡Learning ¡ • Parameters ¡are ¡treated ¡as ¡hidden ¡variables ¡ – no ¡dis*nc*on ¡between ¡inference ¡and ¡learning ¡ • Main ¡disBncBon ¡between ¡inference ¡and ¡ learning: ¡ – # ¡hidden ¡variables ¡grows ¡with ¡size ¡of ¡dataset ¡ – # ¡parameters ¡is ¡fixed ¡

Bayesian Networks Alan Ri2er Problem: Non-IID Data Most - PowerPoint PPT Presentation

Bayesian Networks Alan Ri2er Problem: Non-IID Data Most real-world data is not IID (like coin flips) MulBple correlated variables Examples:

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12

Bayesian Networks Philipp Koehn 29 October 2015 Philipp Koehn Artificial Intelligence: Bayesian

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

ONTOP: Can Non-Pharmacological Interventions be recommended to prevent or reduce critical

ASCO 2009 ASCO 2009 GU Malignancies GU Malignancies: Bladder, Renal, Prostate Dean Bajorin,

Efficient and Incentive-Compatible Liver Exchange Haluk Ergin Tayfun Snmez M. Utku nver U C

Learning Objectives Participants will Identify the two HRSA performance measures related

INFORMED CONSENT IN THE POST MONTGOMERY WORLD Rory Anderson QC Robin Cleland, Advocate Compass

Evidence-Based Medicine : A New Approach in the Practice of Medicine Nilmini Wickramasinghe,

Different kinds of asthma, different kinds of therapies Friday 10 th November 2017 XXXIII

Disclosures Jeanne E. Poole, M.D. Results from the REPLACE Registry FINANCIAL