COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: - PowerPoint PPT Presentation

COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 21. Independence in PGMs; Example PGMs

Statistical Machine Learning (S2 2017) Lecture 21 Independence PGMs encode assumption of statistical independence between variables. Critical to understanding the capabilities of a model, and for efficient inference. 2

Statistical Machine Learning (S2 2017) Lecture 21 Recall: Directed PGM • Nodes • Random variables • Edges (acyclic) • Conditional dependence * Node table: Pr 𝑑ℎ𝑗𝑚𝑒|𝑞𝑏𝑠𝑓𝑜𝑢𝑡 * Child directly depends on parents S T • Joint factorisation 5 Pr 𝑌 1 , 𝑌 3 , … , 𝑌 5 = ∏ Pr 𝑌 8 |𝑌 9 ∈ 𝑞𝑏𝑠𝑓𝑜𝑢𝑡(𝑌 8 ) 8=1 L Graph encodes: • independence assumptions • parameterisation of CPTs 3

Statistical Machine Learning (S2 2017) Lecture 21 Independence relations (D-separation) • Important independence relations between RV’s * Marginal independence P(X, Y) = P(X) P(Y) * Conditional independence P(X, Y | Z) = P(X | Z) P(Y | Z) B | C • Notation A A ⊥ ⊥ B C : * RVs in set A are independent of RVs in set B, when given the values of RVs in C. * Symmetric: can swap roles of A and B * A B denotes marginal independence, C = ∅ A ⊥ ⊥ B • Independence captured in graph structure * Caveat : dependence does not follow in general when X and Y are not independent 4

Statistical Machine Learning (S2 2017) Lecture 21 Marginal Independence • Consider graph fragment X Y • What [marginal] independence relations hold? * X ⟘ Y? Yes − P(X, Y) = P(X) P(X) • What about X ⟘ Z, where X Y Z connected to Y? Z 5

� � Statistical Machine Learning (S2 2017) Lecture 21 Marginal Independence • Consider graph fragment X Y Marginal independence denoted X ⊥ Y Z • What [marginal] independence relations hold? * X ⟘ Z? No − 𝑄 𝑌, 𝑎 = ∑ 𝑄 𝑌 𝑄 𝑍 𝑄(𝑎|𝑌, 𝑍) J * X ⟘ Y? Yes − 𝑄 𝑌, 𝑍 = ∑ 𝑄 𝑌 𝑄 𝑍 𝑄 𝑎 𝑌, 𝑍 K = 𝑄 𝑌 𝑄(𝑍) 6

� � Statistical Machine Learning (S2 2017) Lecture 21 Marginal Independence X Y X Y Z Z Are X and Y marginally dependent? (X ⟘ Y?) 𝑄 𝑌, 𝑍 = ∑ 𝑄 𝑎 𝑄 𝑌 𝑎 𝑄 𝑍|𝑎 … No K 𝑄 𝑌, 𝑍 = ∑ 𝑄 𝑌 𝑄 𝑎 𝑌 𝑄 𝑍|𝑎 ... No K 7

Statistical Machine Learning (S2 2017) Lecture 21 Marginal Independence • Marginal independence can be read off graph * however, must account for edge directions * relates (loosely) to causality : if edges encode causal links, can X affect (cause) Y? • General rules, X and Y are linked by: * no edges, in any direction à independent * intervening node with incoming edges from X and Y (aka head-to-head ) à independent * head-to-tail, tail-to-tail à not (necessarily) independent • … generalises to longer chains of intermediate nodes (coming) 8

Statistical Machine Learning (S2 2017) Lecture 21 Conditional independence • What if we know the value of some RVs? How does this affect the in/dependence relations? • Consider whether X ⊥ Y 𝄆 Z in the canonical graphs X Y X Y X Y Z Z Z * Test by trying to show P(X,Y|Z) = P(X|Z) P(Y|Z). 9

Statistical Machine Learning (S2 2017) Lecture 21 Conditional independence • So far, just graph separation… Not so fast! * cannot factorise the last X Y canonical graph • Known as explaining away: value of Z can give information Z linking X and Y * E.g., X and Y are binary coin flips, and Z is whether they land the same side up. Given Z, then X and Y become completely dependent (deterministic). * A.k.a. Berkson's paradox N.b., Marginal dependence ≠ conditional independence! 11

Statistical Machine Learning (S2 2017) Lecture 21 Explaining away • The washing has fallen off the line A D (W). Was it aliens (A) playing? Or next door’s dog (D)? W A Prob D Prob 0 0.999 A D P(W=1 0 0.9 1 0.001 |A,D) 1 0.1 0.1 0 0 • Results in conditional posterior 0.3 0 1 * P(A=1|W=1) = 0.004 0.5 1 0 * P(A=1|D=1,W=1) = 0.003 0.8 1 1 * P(A=1|D=0,W=1) = 0.005 12

Statistical Machine Learning (S2 2017) Lecture 21 Explaining away II • Explaining away also occurs for A D observed children of the head-head node W * attempt factorise to test A ⊥ D 𝄆 G X P ( A, D | G ) = P ( A ) P ( D ) P ( W | A, D ) P ( G | W ) W = P ( A ) P ( D ) P ( G | A, D ) G A D G 13

Statistical Machine Learning (S2 2017) Lecture 21 “D-separation” Summary • Marginal and cond. independence can be read off graph structure * marginal independence relates (loosely) to causality : if edges encode causal links, can X affect (cause or be caused by) Y? * conditional independence less intuitive • How to apply to larger graphs? * based on paths separating nodes, i.e., do they contain nodes with head-to-head, head-to-tail or tail-to-tail links? * can all [undirected!] paths connecting two nodes be blocked by an independence relation? 14

Statistical Machine Learning (S2 2017) Lecture 21 D-separation in larger PGM • Consider pair of nodes CTL FG FA ⊥ FG? FA GRL Paths: FA – CTL – GRL – FG AS FA – AS – GRL – FG • Paths can be blocked by independence • More formally see “ Bayes Ball ” algorithm which formalises notion of d-separation as reachability in the graph, subject to specific traversal rules. 15

Statistical Machine Learning (S2 2017) Lecture 21 What’s the point of d-separation? • Designing the graph * understand what independence assumptions are being made; not just the obvious ones * informs trade-off between expressiveness and complexity • Inference with the graph * computing of conditional / marginal distributions must respect in/dependences between RVs * affects complexity (space, time) of inference 16

Statistical Machine Learning (S2 2017) Lecture 21 Markov Blanket • For an RV what is the minimal set of other RVs that make it conditionally independent from the rest of the graph? * what conditioning variables can be safely dropped from P(X j | X 1 , X 2 , …, X j-1 , X j+1 , …, X n )? • Solve using d-separation rules from graph • Important for predictive inference (e.g., in pseudolikelihood, Gibbs sampling, etc) 17

Statistical Machine Learning (S2 2017) Lecture 21 Undirected PGMs Undirected variant of PGM, parameterised by arbitrary positive valued functions of the variables, and global normalisation. A.k.a. Markov Random Field. 18

Statistical Machine Learning (S2 2017) Lecture 21 Undirected vs directed Undirected PGM Directed PGM • Graph • Graph * Edges undirected * Edged directed • Probability • Probability * Each node a r.v. * Each node a r.v. * Each clique C has “factor” * Each node has conditional ψ T 𝑌 9 : 𝑘 ∈ 𝐷 ≥ 0 𝑞 𝑌 8 |𝑌 9 ∈ 𝑞𝑏𝑠𝑓𝑜𝑢𝑡(𝑌 8 ) * Joint ∝ product of factors * Joint = product of cond’ls Key difference = normalisation 19

Statistical Machine Learning (S2 2017) Lecture 21 Undirected PGM formulation • Based on notion of A E * Clique : a set of fully connected nodes (e.g., A-D, C-D, C-D-F) B D * Maximal clique : largest cliques in graph (not C-D, due to C-D-F) C F • Joint probability defined as P ( a, b, c, d, e, f ) = 1 Z ψ 1 ( a, b ) ψ 2 ( b, c ) ψ 3 ( a, d ) ψ 4 ( d, c, f ) ψ 5 ( d, e ) * where ψ is a positive function and Z is the normalising ‘partition’ function X Z = ψ 1 ( a, b ) ψ 2 ( b, c ) ψ 3 ( a, d ) ψ 4 ( d, c, f ) ψ 5 ( d, e ) 20 a,b,c,d,e,f

Statistical Machine Learning (S2 2017) Lecture 21 d-separation in U-PGMs • Good news! Simpler dependence semantics * conditional independence relations = graph connectivity * if all paths between nodes in set X and Y pass through an observed nodes Z then X ⊥ Y 𝄆 Z • For example B ⊥ D 𝄆 {A, C} A E • Markov blanket of node = its immediate neighbours B D C F 21

Statistical Machine Learning (S2 2017) Lecture 21 Directed to undirected • Directed PGM formulated as k Y P ( X 1 , X 2 , . . . , X k ) = Pr ( X i | X π i ) i =1 where 𝛒 indexes parents. • Equivalent to U-PGM with * each conditional probability term is included in one factor function, ψ c * clique structure links groups of variables, i.e., {{ X i } ∪ X π i , ∀ i } * normalisation term trivial, Z = 1 22

Statistical Machine Learning (S2 2017) Lecture 21 CTL FG 1. copy nodes FA GRL 2. copy edges, undirected AS 3. ‘moralise’ parent nodes CTL FG FA GRL AS 23

Statistical Machine Learning (S2 2017) Lecture 21 Why U-PGM? • Pros * generalisation of D-PGM * simpler means of modelling without the need for per- factor normalisation * general inference algorithms use U-PGM representation (supporting both types of PGM) • Cons * (slightly) weaker independence * calculating global normalisation term (Z) intractable in general (but tractable for chains/trees, e.g., CRFs) 24

Statistical Machine Learning (S2 2017) Lecture 21 Summary • Notion of independence, ‘d-separation’ * marginal vs conditional independence * explaining away, Markov blanket * undirected PGMs & relation to directed PGMs • Share common training & prediction algorithms (coming up next!) 25

COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: - PowerPoint PPT Presentation

COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 21. Independence in PGMs; Example PGMs Statistical Machine Learning (S2 2017) Lecture 21 Independence PGMs encode assumption of statistical independence between

Lecture 1. Introduction. Probability Theory COMP90051 Machine Learning Sem2 2017 Lecturer:

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 22. PGM

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

MACHINE LEARNING, STATISTICAL LEARNING AND PARALLEL COMPUTING INTRODUCTION VS MACHINE LEARNING

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Contents Introduction 1 Graphical Models and the PC Algorithm Conditional Independence

Reproducibility and Cognitive Issues in Publications Based on Big Data elimir Kurtanjek

wide side of the Internet why 1/x distribu4on is so

Web Dynamics Part 2 Modeling static and evolving graphs 2.1 The Web graph and its static

Bernoulli Equations Bernd Schr oder logo1 Bernd Schr oder Louisiana Tech University,

Discrete Probability: a brief review CMPS 4750/6750: Computer Networks 1 Applications of

MINING Text Data: Nave Bayes Instructor: Yizhou Sun yzsun@cs.ucla.edu December 7, 2017

Expectation Maximization Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia

COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: - PowerPoint PPT Presentation

COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 21. Independence in PGMs; Example PGMs Statistical Machine Learning (S2 2017) Lecture 21 Independence PGMs encode assumption of statistical independence between

Lecture 1. Introduction. Probability Theory COMP90051 Machine Learning Sem2 2017 Lecturer:

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 22. PGM

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

MACHINE LEARNING, STATISTICAL LEARNING AND PARALLEL COMPUTING INTRODUCTION VS MACHINE LEARNING

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Contents Introduction 1 Graphical Models and the PC Algorithm Conditional Independence

Reproducibility and Cognitive Issues in Publications Based on Big Data elimir Kurtanjek

wide side of the Internet why 1/x distribu4on is so

Web Dynamics Part 2 Modeling static and evolving graphs 2.1 The Web graph and its static

Bernoulli Equations Bernd Schr oder logo1 Bernd Schr oder Louisiana Tech University,

Discrete Probability: a brief review CMPS 4750/6750: Computer Networks 1 Applications of

MINING Text Data: Nave Bayes Instructor: Yizhou Sun yzsun@cs.ucla.edu December 7, 2017

Expectation Maximization Henrik I. Christensen Robotics &amp; Intelligent Machines @ GT Georgia

Expectation Maximization Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia