natural language processing cse 517 graphical models
play

Natural Language Processing (CSE 517): Graphical Models Noah Smith - PowerPoint PPT Presentation

Natural Language Processing (CSE 517): Graphical Models Noah Smith 2016 c University of Washington nasmith@cs.washington.edu February 810, 2016 1 / 77 Notation Let V = V 1 , V 2 , . . . , V be a collection of random


  1. Natural Language Processing (CSE 517): Graphical Models Noah Smith � 2016 c University of Washington nasmith@cs.washington.edu February 8–10, 2016 1 / 77

  2. Notation Let V = � V 1 , V 2 , . . . , V ℓ � be a collection of random variables (not necessarily a sequence). Val( V ) will denote the values of a r.v. V . V I denotes a subset of the r.v.s V with indices i ∈ I . V ¬ I = V \ V I Recall: ◮ p ( V ) = � ℓ i =1 p ( V i | V 1 , . . . , V i − 1 ) (always true, for any ordering) ◮ p ( V I , V J | V K ) = p ( V I | V K ) · p ( V J | V K ) if and only if V I ⊥ V J | V K (conditional independence) ◮ p ( V I = v I ) = � v ¬ I ∈ Val( V ¬ I ) p ( V I = v I , V ¬ I = v ¬ I ) (marginalization) 2 / 77

  3. Factor Graphs Two kinds of vertices: ◮ Random variables (denoted by circles, “ V i ”) ◮ Factors (denoted by squares, “ f j ”) The graph is bipartite ; every edge connects some variable to some factor. Let I j ⊆ { 1 , . . . , ℓ } be the set of variables f j is connected to. Factor f j defines a map Val( V I j ) → R ≥ 0 . The graph and factors define a probability distribution: � p ( V = v ) ∝ f j ( v I j ) j 3 / 77

  4. Factor Graphs We’ve Seen Before Hidden Markov model: y 0 y 1 y 2 y 3 y 4 y 5 x 1 x 2 x 3 x 4 General first-order sequence model: y 0 y 1 y 2 y 3 y 4 y 5 x 4 / 77

  5. Two Kinds of Factors Conditional probability tables. E.g., if I j = { 1 , 2 , 3 } : f j ( v 1 , v 2 , v 3 ) = p ( V 3 = v 3 | V 1 = v 1 , V 2 = v 2 ) Lead to Bayesian networks (with some constraints). Potential functions (arbitrary nonnegative values). Lead to Markov random fields (a.k.a. Markov networks). 5 / 77

  6. Yucky Bayesian Network Influenza Allergies Sinus Inflamm. Runny Headache Nose Sinus inflammation is caused by flu, but also by allergies. Runny nose and headache are both caused by sinus inflammation. 6 / 77

  7. Yucky Factor Graph Influenza Allergies Sinus Inflamm. Runny Headache Nose Sinus inflammation is caused by flu, but also by allergies. Runny nose and headache are both caused by sinus inflammation. 7 / 77

  8. Yucky Factor Graph Influenza Allergies Sinus Inflamm. Runny Headache Nose S I A f S,I,A 0 0 0 0 0 1 R S f R,S H S f H,S I f I I f A 0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 1 1 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 8 / 77

  9. Yucky Factor Graph Influenza Allergies Influenza Allergies Sinus Sinus Inflamm. Inflamm. Runny Runny Headache Headache Nose Nose S I A f S,I,A 0 0 0 0 0 1 R S f R,S H S f H,S I f I I f A 0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 1 1 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 p ( i, a, s, r, h ) = f I ( i ) · f A ( a ) · f S,I,A ( s, i, a ) · f R,S ( r, s ) · f H,S ( h, s ) = p ( i ) · p ( a ) · p ( s | i, a ) · p ( r | s ) · p ( h | s ) 9 / 77

  10. Naughty Markov Random Field Adrian Dana Brook Chris Independencies: A ⊥ C | B, D ; B ⊥ D | A, C ; ¬ A ⊥ C ; ¬ B ⊥ D 10 / 77

  11. Naughty Factor Graph Adrian Dana Brook Chris A B f A,B B C f B,C C D f C,D D A f D,A 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 p ( a, b, c, d ) = f A,B ( a, b ) · f B,C ( b, c ) · f C,D ( c, d ) · f D,A ( d, a ) � � � � f A,B ( a ′ , b ′ ) · f B,C ( b ′ , c ′ ) · f C,D ( c ′ , d ′ ) · f D,A ( d ′ , a ′ ) a ′ ∈ b ′ ∈ c ′ ∈ d ′ ∈ Val( A ) Val( B ) Val( C ) Val( D ) 11 / 77

  12. Assignment Probabilities: Examples Adrian Dana Brook Chris A B f A,B B C f B,C C D f C,D D A f D,A 0 0 30 0 0 100 0 0 1 0 0 100 0 1 5 0 1 1 0 1 100 0 1 1 1 0 1 1 0 1 1 0 100 1 0 1 1 1 10 1 1 100 1 1 1 1 1 100 12 / 77

  13. Assignment Probabilities: Examples Adrian Dana Brook Chris A B f A,B B C f B,C C D f C,D D A f D,A 0 0 30 0 0 100 0 0 1 0 0 100 0 1 5 0 1 1 0 1 100 0 1 1 1 0 1 1 0 1 1 0 100 1 0 1 1 1 10 1 1 100 1 1 1 1 1 100 � � � � f A,B ( a ′ , b ′ ) · f B,C ( b ′ , c ′ ) · f C,D ( c ′ , d ′ ) · f D,A ( d ′ , a ′ ) a ′ ∈ b ′ ∈ c ′ ∈ d ′ ∈ Val( A ) Val( B ) Val( C ) Val( D ) = 7 , 201 , 840 13 / 77

  14. Assignment Probabilities: Examples Adrian Dana Brook Chris A B f A,B B C f B,C C D f C,D D A f D,A 0 0 30 0 0 100 0 0 1 0 0 100 0 1 5 0 1 1 0 1 100 0 1 1 1 0 1 1 0 1 1 0 100 1 0 1 1 1 10 1 1 100 1 1 1 1 1 100 p ( A = 0 , B = 1 , C = 1 , D = 0) = 5 , 000 , 000 7 , 201 , 840 ≈ 0 . 69 14 / 77

  15. Assignment Probabilities: Examples Adrian Dana Brook Chris A B f A,B B C f B,C C D f C,D D A f D,A 0 0 30 0 0 100 0 0 1 0 0 100 0 1 5 0 1 1 0 1 100 0 1 1 1 0 1 1 0 1 1 0 100 1 0 1 1 1 10 1 1 100 1 1 1 1 1 100 10 p ( A = 1 , B = 1 , C = 0 , D = 0) = 7 , 201 , 840 ≈ 0 . 0000014 15 / 77

  16. Structure and Independence Bayesian networks: ◮ A variable is conditionally independent of its non-descendants given its parents. Markov networks: ◮ Conditional independence derived from “Markov blanket” and separation properties. Local configurations can be used to check all conditional independence questions; almost no need to look at the values in the factors! 16 / 77

  17. Independence “Spectrum” ℓ � f V i ( V i ) f V ( V ) i =1 everything is independent everything can be interdependent minimal expressive power arbitrary expressive power fewer parameters more parameters 17 / 77

  18. Operations on Factors: Multiplication Given two factors f U and f V , we can create a new “product” factor such that: f U ∪ V ( u ∪ v ) = f U ( u ) · f V ( v ) for all u ∈ Val( U ) and all v ∈ Val( V ) . A B C f A,B,C 0 0 0 3,000 0 0 1 30 A B f A,B B C f B,C 0 0 30 0 0 100 0 1 0 5 · = 0 1 5 0 1 1 0 1 1 500 1 0 1 1 0 1 1 0 0 100 1 1 10 1 1 100 1 0 1 1 1 1 0 10 1 1 1 1,000 18 / 77

  19. Operations on Factors: Multiplication Given two factors f U and f V , we can create a new “product” factor such that: f U ∪ V ( u ∪ v ) = f U ( u ) · f V ( v ) for all u ∈ Val( U ) and all v ∈ Val( V ) . A B C f A,B,C 0 0 0 3,000 A B f A,B B C f B,C 0 0 1 30 0 0 30 0 0 100 0 1 0 5 · = 0 1 5 0 1 1 0 1 1 500 1 0 1 1 0 1 1 0 0 100 1 1 10 1 1 100 1 0 1 1 1 1 0 10 1 1 1 1,000 This might remind you of a join operation on a database. 19 / 77

  20. Operations on Factors: Multiplication Given two factors f U and f V , we can create a new “product” factor such that: f U ∪ V ( u ∪ v ) = f U ( u ) · f V ( v ) for all u ∈ Val( U ) and all v ∈ Val( V ) . A B C f A,B,C 0 0 0 3,000 A B f A,B B C f B,C 0 0 1 30 0 0 30 0 0 100 0 1 0 5 · = 0 1 5 0 1 1 0 1 1 500 1 0 1 1 0 1 1 0 0 100 1 1 10 1 1 100 1 0 1 1 1 1 0 10 1 1 1 1,000 What happens if you multiply out all the factors in a factor graph? 20 / 77

  21. Operations on Factors: Maximization Given a factor f U and a variable V �∈ U , we can transform f U ,V into f U by: f U ( u ) = v ∈ Val( V ) f U ,V ( u , v ) max for all u ∈ Val( U ) . A B C f A,B,C 0 0 0 3,000 0 0 1 30 A C f A,C 0 0 3,000 B = 0 0 1 0 5 = max 0 1 500 B = 1 0 1 1 500 B 1 0 100 B = 0 1 0 0 100 1 1 1,000 B = 1 1 0 1 1 1 1 0 10 1 1 1 1,000 21 / 77

  22. Operations on Factors: Marginalization Given a factor f U and a variable V �∈ U , we can transform f U ,V into f U by: � f U ( u ) = f U ,V ( u , v ) v ∈ Val( V ) for all u ∈ Val( U ) . A B C f A,B,C 0 0 0 3,000 A C f A,C 0 0 1 30 � 0 0 3,000 + 5 0 1 0 5 = 0 1 30 + 500 0 1 1 500 1 0 100 + 10 1 0 0 100 B 1 1 1 + 1,000 1 0 1 1 1 1 0 10 1 1 1 1,000 22 / 77

  23. Operations on Factors: Marginalization Given a factor f U and a variable V �∈ U , we can transform f U ,V into f U by: � f U ( u ) = f U ,V ( u , v ) v ∈ Val( V ) for all u ∈ Val( U ) . A B C f A,B,C 0 0 0 3,000 A C f A,C 0 0 1 30 0 0 3,000 + 5 � 0 1 0 5 = 0 1 30 + 500 0 1 1 500 1 0 100 + 10 1 0 0 100 B 1 1 1 + 1,000 1 0 1 1 1 1 0 10 1 1 1 1,000 If you multiply out all the factors in a factor graph, then sum out each variable, one by one, until none are left, what do you get? 23 / 77

  24. Factors are like numbers. ◮ Products are commutative: f 1 · f 2 = f 2 · f 1 24 / 77

  25. Factors are like numbers. ◮ Products are commutative: f 1 · f 2 = f 2 · f 1 ◮ Products are associative: ( f 1 · f 2 ) · f 3 = f 1 · ( f 2 · f 3 ) 25 / 77

  26. Factors are like numbers. ◮ Products are commutative: f 1 · f 2 = f 2 · f 1 ◮ Products are associative: ( f 1 · f 2 ) · f 3 = f 1 · ( f 2 · f 3 ) � � � � ◮ Sums are commutative: f = f X Y Y X 26 / 77

  27. Factors are like numbers. ◮ Products are commutative: f 1 · f 2 = f 2 · f 1 ◮ Products are associative: ( f 1 · f 2 ) · f 3 = f 1 · ( f 2 · f 3 ) � � � � ◮ Sums are commutative: f = f X Y Y X ◮ Maximizations are commutative: max max f = max max f X Y Y X 27 / 77

Recommend


More recommend