advanced dynamic programming in cl
play

Advanced Dynamic Programming in CL: Theory, Algorithms, and - PowerPoint PPT Presentation

Advanced Dynamic Programming in CL: Theory, Algorithms, and Applications (S, 0, n) w 0 w 1 ... w n-1 Liang Huang University of Pennsylvania A Little Bit of History... Liang Huang (Penn) Dynamic Programming 2 A Little Bit of History...


  1. Monotonicity • monotonicity Let K = ( A, ⊕ , ⊗ , 0 , 1) be a semiring, and ≤ a partial ordering over A . We say K is monotonic if for all a, b, c ∈ A ( a ≤ b ) ⇒ ( a ⊗ c ≤ b ⊗ c ) ( a ≤ b ) ⇒ ( c ⊗ a ≤ c ⊗ b ) • optimal substructure in dynamic programming A: b’ ⊗ c A: b ⊗ c ≤ b ⊗ c C: c C: c B: b B: b ’ ≤ b • idempotent => monotone (from distributivity) • (a+b) ⊗ c = (a ⊗ c)+(b ⊗ c); if a ≤ b, (a ⊗ c)=(a ⊗ c)+(b ⊗ c) Liang Huang (Penn) Dynamic Programming 11

  2. Monotonicity • monotonicity Let K = ( A, ⊕ , ⊗ , 0 , 1) be a semiring, and ≤ a partial ordering over A . We say K is monotonic if for all a, b, c ∈ A ( a ≤ b ) ⇒ ( a ⊗ c ≤ b ⊗ c ) ( a ≤ b ) ⇒ ( c ⊗ a ≤ c ⊗ b ) • optimal substructure in dynamic programming A: b’ ⊗ c A: b ⊗ c ≤ b ⊗ c C: c C: c B: b B: b ’ ≤ b • idempotent => monotone (from distributivity) • (a+b) ⊗ c = (a ⊗ c)+(b ⊗ c); if a ≤ b, (a ⊗ c)=(a ⊗ c)+(b ⊗ c) • by def. of comparison, a ⊗ c ≤ b ⊗ c Liang Huang (Penn) Dynamic Programming 11

  3. DP on Graphs • optimization problems on graphs => generic shortest-path problem • weighted directed graph G=(V, E) with a function w that assigns each edge a weight from a semiring • compute the best weight of the target vertex t • generic update along edge (u, v) w (u, v) d ( v ) ⊕ = d ( u ) ⊗ w ( u, v ) u v • how to avoid cyclic updates? d ( v ) ← d ( v ) ⊕ ( d ( u ) ⊗ w ( u, v )) • only update when d(u) is fixed Liang Huang (Penn) Dynamic Programming 12

  4. Two Dimensional Survey traversing order topological best-first (acyclic) (superior) graphs with semirings Viterbi Dijkstra search space (e.g., FSMs) hypergraphs with Generalized weight functions Knuth Viterbi (e.g., CFGs) Liang Huang (Penn) Dynamic Programming 13

  5. Viterbi Algorithm for DAGs 1. topological sort 2. visit each vertex v in sorted order and do updates • for each incoming edge (u, v) in E • use d(u) to update d(v): d ( v ) ⊕ = d ( u ) ⊗ w ( u, v ) • key observation: d(u) is fixed to optimal at this time w (u, v) u v • time complexity: O( V + E ) Liang Huang (Penn) Dynamic Programming 14

  6. Variant 1: forward-update 1. topological sort 2. visit each vertex v in sorted order and do updates • for each outgoing edge (v, u) in E • use d(v) to update d(u): d ( u ) ⊕ = d ( v ) ⊗ w ( v, u ) • key observation: d(v) is fixed to optimal at this time w (v, u) u v • time complexity: O( V + E ) Liang Huang (Penn) Dynamic Programming 15

  7. Examples Liang Huang (Penn) Dynamic Programming 16

  8. Examples • [Number of Paths in a DAG] Liang Huang (Penn) Dynamic Programming 16

  9. Examples • [Number of Paths in a DAG] • just use the counting semiring (N, +, × , 0, 1) • note: this is not an optimization problem! Liang Huang (Penn) Dynamic Programming 16

  10. Examples • [Number of Paths in a DAG] • just use the counting semiring (N, +, × , 0, 1) • note: this is not an optimization problem! • [Longest Path in a DAG] Liang Huang (Penn) Dynamic Programming 16

  11. Examples • [Number of Paths in a DAG] • just use the counting semiring (N, +, × , 0, 1) • note: this is not an optimization problem! • [Longest Path in a DAG] • just use the semiring ( R ∪ {−∞} , max , + , −∞ , 0) Liang Huang (Penn) Dynamic Programming 16

  12. Examples • [Number of Paths in a DAG] • just use the counting semiring (N, +, × , 0, 1) • note: this is not an optimization problem! • [Longest Path in a DAG] • just use the semiring ( R ∪ {−∞} , max , + , −∞ , 0) • [Part-of-Speech Tagging with a Hidden Markov Model] Liang Huang (Penn) Dynamic Programming 16

  13. Examples • [Number of Paths in a DAG] • just use the counting semiring (N, +, × , 0, 1) • note: this is not an optimization problem! • [Longest Path in a DAG] • just use the semiring ( R ∪ {−∞} , max , + , −∞ , 0) • [Part-of-Speech Tagging with a Hidden Markov Model] Liang Huang (Penn) Dynamic Programming 16

  14. Example: Speech Alignment time complexity: O(n 2 ) also used in: edit distance biological sequence alignment Liang Huang (Penn) Dynamic Programming 17

  15. Example: Word Alignment • key difference • reorderings in translation! I love you . • sequence/speech alignment k Je is always monotonic • complexity under HMM t’ • word alignment is O( n 3 ) j aime • for every ( i, j ) . • enumerate all ( i -1, k ) i- 1 i • sequence alignment O( n 2 ) Liang Huang (Penn) Dynamic Programming 18

  16. Chinese Word Segmentation 下 雨 天 地 面 积 水 xia yu tian di mian ji shui Liang Huang (Penn) Dynamic Programming 19

  17. Chinese Word Segmentation 民主 min-zhu people-dominate “democracy” 下 雨 天 地 面 积 水 xia yu tian di mian ji shui Liang Huang (Penn) Dynamic Programming 19

  18. Chinese Word Segmentation 江泽民 主席 民主 min-zhu jiang-ze-min zhu-xi people-dominate ... - ... - people dominate-podium “democracy” “President Jiang Zemin” 下 雨 天 地 面 积 水 xia yu tian di mian ji shui Liang Huang (Penn) Dynamic Programming 19

  19. Chinese Word Segmentation 江泽民 主席 民主 min-zhu jiang-ze-min zhu-xi people-dominate ... - ... - people dominate-podium this was 5 years ago. “democracy” “President Jiang Zemin” now Google is good at segmentation! 下 雨 天 地 面 积 水 xia yu tian di mian ji shui Liang Huang (Penn) Dynamic Programming 19

  20. Chinese Word Segmentation 江泽民 主席 民主 min-zhu jiang-ze-min zhu-xi people-dominate ... - ... - people dominate-podium this was 5 years ago. “democracy” “President Jiang Zemin” now Google is good at segmentation! 下 雨 天 地 面 积 水 xia yu tian di mian ji shui Liang Huang (Penn) Dynamic Programming 19

  21. Chinese Word Segmentation 江泽民 主席 民主 min-zhu jiang-ze-min zhu-xi people-dominate ... - ... - people dominate-podium this was 5 years ago. “democracy” “President Jiang Zemin” now Google is good at segmentation! 下 雨 天 地 面 积 水 xia yu tian di mian ji shui graph search Liang Huang (Penn) Dynamic Programming 19

  22. Phrase-based Decoding 与 沙龙 举行 了 会谈 yu Shalong juxing le huitan held a talk with Sharon Sharon held talks with with Sharon held a talk yu Shalong juxing le huitan Huang and Chiang Forest Rescoring 20

  23. Phrase-based Decoding 与 沙龙 举行 了 会谈 yu Shalong juxing le huitan held a talk with Sharon _ _ _ _ _ Sharon held talks with with Sharon held a talk yu Shalong juxing le huitan Huang and Chiang Forest Rescoring 20

  24. Phrase-based Decoding 与 沙龙 举行 了 会谈 yu Shalong juxing le huitan held a talk with Sharon _ _ ● _ _ _ _ _ ● ● Sharon held talks with with Sharon held a talk yu Shalong juxing le huitan Huang and Chiang Forest Rescoring 20

  25. Phrase-based Decoding 与 沙龙 举行 了 会谈 yu Shalong juxing le huitan held a talk with Sharon _ _ ● ● ● ● ● ● _ _ _ _ _ ● ● Sharon held talks with with Sharon held a talk yu Shalong juxing le huitan Huang and Chiang Forest Rescoring 20

  26. Phrase-based Decoding 与 沙龙 举行 了 会谈 yu Shalong juxing le huitan held a talk with Sharon _ _ ● ● ● ● ● ● _ _ _ _ _ ● ● Sharon held talks with with Sharon held a talk yu Shalong juxing le huitan Huang and Chiang Forest Rescoring 21

  27. Phrase-based Decoding 与 沙龙 举行 了 会谈 yu Shalong juxing le huitan held a talk with Sharon _ ● ● ● ● _ _ ● ● ● ● ● ● _ _ _ _ _ ● ● Sharon held talks with with Sharon held a talk yu Shalong juxing le huitan Huang and Chiang Forest Rescoring 21

  28. Phrase-based Decoding source-side: coverage vector 与 沙龙 举行 了 会谈 _ _ ● ● ● yu Shalong juxing le huitan held a talk target-side: grow hypotheses held a talk with Sharon strictly left-to-right ... ... _ _ ● ● ● ● ● ● _ _ _ _ _ ● ● held a talk held a talk with Sharon ... ... space: O(2 n ), time: O(2 n n 2 ) -- cf. traveling salesman problem Huang and Chiang Forest Rescoring 22

  29. Traveling Salesman Problem & MT • a classical NP-hard problem • goal: visit each city once and only once • exponential-time dynamic programming • state: cities visited so far (bit-vector) • search in this O(2 n ) transformed graph • MT: each city is a source-language word • restrictions in reordering can reduce complexity => distortion limit • => syntax-based MT (Held and Karp, 1962; Knight, 1999) Huang and Chiang Forest Rescoring 23

  30. Traveling Salesman Problem & MT • a classical NP-hard problem • goal: visit each city once and only once • exponential-time dynamic programming • state: cities visited so far (bit-vector) • search in this O(2 n ) transformed graph • MT: each city is a source-language word • restrictions in reordering can reduce complexity => distortion limit • => syntax-based MT (Held and Karp, 1962; Knight, 1999) Huang and Chiang Forest Rescoring 23

  31. Adding a Bigram Model • “refined” graph: annotated with language model words • still dynamic programming, just larger search space ●●●●● ... Shalong ... meeting _ _ ● ● ● ●●●●● ... Sharon ... talk _ _ _ _ _ _ _ ● ● ● ... talks _ _ ● ● ● Huang and Chiang Forest Rescoring 24

  32. Adding a Bigram Model • “refined” graph: annotated with language model words • still dynamic programming, just larger search space ●●●●● ... Shalong ... meeting _ _ ● ● ● with Sharon ●●●●● ... Sharon ... talk _ _ _ _ _ _ _ ● ● ● ... talks _ _ ● ● ● Huang and Chiang Forest Rescoring 24

  33. Adding a Bigram Model • “refined” graph: annotated with language model words • still dynamic programming, just larger search space ●●●●● ... Shalong ... meeting _ _ ● ● ● with Sharon ●●●●● ... Sharon ... talk _ _ _ _ _ _ _ ● ● ● bigram ... talks _ _ ● ● ● Huang and Chiang Forest Rescoring 24

  34. Adding a Bigram Model • “refined” graph: annotated with language model words • still dynamic programming, just larger search space ●●●●● ... Shalong ... meeting _ _ ● ● ● with Sharon ●●●●● ... Sharon ... talk _ _ _ _ _ _ _ ● ● ● bigram ... talks _ _ ● ● ● space: O(2 n ), time: O(2 n n 2 ) => space: O(2 n V m- 1 ), time: O(2 n V m- 1 n 2 ) for m- gram language models Huang and Chiang Forest Rescoring 24

  35. Two Dimensional Survey traversing order topological best-first (acyclic) (superior) graphs with semirings Viterbi Dijkstra search space (e.g., FSMs) hypergraphs with Generalized weight functions Knuth Viterbi (e.g., CFGs) Liang Huang (Penn) Dynamic Programming 25

  36. Dijkstra Algorithm d(u) d(u) ⊗ w (e) w (e) Liang Huang (Penn) Dynamic Programming 26

  37. Dijkstra Algorithm • Dijkstra does not require acyclicity • instead of topological order, we use best-first order • but this requires superiority of the semiring Let K = ( A, ⊕ , ⊗ , 0 , 1) be a semiring, and ≤ a partial ordering over A . We say K is superior if for all a, b ∈ A a ≤ a ⊗ b, b ≤ a ⊗ b. • intuition: combination always gets worse d(u) d(u) ⊗ w (e) w (e) Liang Huang (Penn) Dynamic Programming 26

  38. Dijkstra Algorithm • Dijkstra does not require acyclicity • instead of topological order, we use best-first order • but this requires superiority of the semiring Let K = ( A, ⊕ , ⊗ , 0 , 1) be a semiring, and ≤ a partial ordering over A . We say K is superior if for all a, b ∈ A a ≤ a ⊗ b, b ≤ a ⊗ b. • intuition: combination always gets worse • contrast: monotonicity: combination preserves order d(u) d(u) ⊗ w (e) w (e) Liang Huang (Penn) Dynamic Programming 26

  39. Dijkstra Algorithm • Dijkstra does not require acyclicity • instead of topological order, we use best-first order • but this requires superiority of the semiring Let K = ( A, ⊕ , ⊗ , 0 , 1) be a semiring, and ≤ a partial ordering over A . We say K is superior if for all a, b ∈ A a ≤ a ⊗ b, b ≤ a ⊗ b. • intuition: combination always gets worse • contrast: monotonicity: combination preserves order ( { 0 , 1 } , ∨ , ∧ , 0 , 1) ([0 , 1] , max , × , 0 , 1) ( R + ∪ { + ∞} , min , + , + ∞ , 0) d(u) d(u) ⊗ w (e) w (e) ( R ∪ { + ∞} , min , + , + ∞ , 0) Liang Huang (Penn) Dynamic Programming 26

  40. Dijkstra Algorithm • Dijkstra does not require acyclicity • instead of topological order, we use best-first order • but this requires superiority of the semiring Let K = ( A, ⊕ , ⊗ , 0 , 1) be a semiring, and ≤ a partial ordering over A . We say K is superior if for all a, b ∈ A a ≤ a ⊗ b, b ≤ a ⊗ b. • intuition: combination always gets worse • contrast: monotonicity: combination preserves order ( { 0 , 1 } , ∨ , ∧ , 0 , 1) ([0 , 1] , max , × , 0 , 1) ( R + ∪ { + ∞} , min , + , + ∞ , 0) d(u) d(u) ⊗ w (e) w (e) ( R ∪ { + ∞} , min , + , + ∞ , 0) Liang Huang (Penn) Dynamic Programming 26

  41. Dijkstra Algorithm • keep a cut (S : V - S) where S vertices are fixed • maintain a priority queue Q of V - S vertices • each iteration choose the best vertex v from Q • move v to S, and use d(v) to forward-update others d ( u ) ⊕ = d ( v ) ⊗ w ( v, u ) ... s v time complexity: O((V+E) lgV) (binary heap) S V - S O(V lgV + E) (fib. heap) Liang Huang (Penn) Dynamic Programming 27

  42. Dijkstra Algorithm • keep a cut (S : V - S) where S vertices are fixed • maintain a priority queue Q of V - S vertices • each iteration choose the best vertex v from Q • move v to S, and use d(v) to forward-update others d ( u ) ⊕ = d ( v ) ⊗ w ( v, u ) ... s v time complexity: O((V+E) lgV) (binary heap) S V - S O(V lgV + E) (fib. heap) Liang Huang (Penn) Dynamic Programming 27

  43. Dijkstra Algorithm • keep a cut (S : V - S) where S vertices are fixed • maintain a priority queue Q of V - S vertices • each iteration choose the best vertex v from Q • move v to S, and use d(v) to forward-update others w (v, u) d ( u ) ⊕ = d ( v ) ⊗ w ( v, u ) u ... s v time complexity: O((V+E) lgV) (binary heap) S V - S O(V lgV + E) (fib. heap) Liang Huang (Penn) Dynamic Programming 27

  44. Viterbi vs. Dijkstra • structural vs. algebraic constraints • Dijkstra only applicable to optimization problems monotonic optimization problems Liang Huang (Penn) Dynamic Programming 28

  45. Viterbi vs. Dijkstra • structural vs. algebraic constraints • Dijkstra only applicable to optimization problems monotonic optimization problems acyclic: Viterbi Liang Huang (Penn) Dynamic Programming 28

  46. Viterbi vs. Dijkstra • structural vs. algebraic constraints • Dijkstra only applicable to optimization problems monotonic optimization problems acyclic: superior: Viterbi Dijkstra Liang Huang (Penn) Dynamic Programming 28

  47. Viterbi vs. Dijkstra • structural vs. algebraic constraints • Dijkstra only applicable to optimization problems monotonic optimization problems many acyclic: superior: NLP Viterbi Dijkstra problems Liang Huang (Penn) Dynamic Programming 28

  48. Viterbi vs. Dijkstra • structural vs. algebraic constraints • Dijkstra only applicable to optimization problems monotonic optimization problems many acyclic: superior: NLP Viterbi Dijkstra problems forward-backward (Inside semiring) Liang Huang (Penn) Dynamic Programming 28

  49. Viterbi vs. Dijkstra • structural vs. algebraic constraints • Dijkstra only applicable to optimization problems monotonic optimization problems many acyclic: superior: NLP Viterbi Dijkstra problems forward-backward non-probabilistic (Inside semiring) models Liang Huang (Penn) Dynamic Programming 28

  50. Viterbi vs. Dijkstra • structural vs. algebraic constraints • Dijkstra only applicable to optimization problems monotonic optimization problems many acyclic: superior: NLP Viterbi Dijkstra problems forward-backward cyclic FSMs/ non-probabilistic (Inside semiring) grammars models Liang Huang (Penn) Dynamic Programming 28

  51. What if both fail? monotonic optimization problems many acyclic: superior: NLP Viterbi Dijkstra problems generalized Bellman-Ford (CLR, 1990; Mohri, 2002) or, first do strongly-connected components (SCC) which gives a DAG; use Viterbi globally on this SCC-DAG; use Bellman-Ford locally within each SCC Liang Huang (Penn) Dynamic Programming 29

  52. What if both work? monotonic optimization problems many acyclic: superior: NLP Viterbi Dijkstra problems full Dijkstra is slower than Viterbi O((V + E) lgV) vs. O(V + E) but it can finish as early as the target vertex is popped a (V + E) lgV vs. V + E Q : how to (magically) reduce a ? Liang Huang (Penn) Dynamic Programming 30

  53. A* Search: Intuition • Dijkstra is “blind” about how far the target is • may get “trapped” by obstacles • can we be more intelligent about the future? • idea: prioritize by s-v distance + v-t estimate v s t u Liang Huang (Penn) Dynamic Programming 31

  54. A* Search: Intuition • Dijkstra is “blind” about how far the target is • may get “trapped” by obstacles • can we be more intelligent about the future? • idea: prioritize by s-v distance + v-t estimate v s t u Liang Huang (Penn) Dynamic Programming 31

  55. A* Search: Intuition • Dijkstra is “blind” about how far the target is • may get “trapped” by obstacles • can we be more intelligent about the future? • idea: prioritize by s-v distance + v-t estimate v s t u Liang Huang (Penn) Dynamic Programming 31

  56. A* Heuristic h(v) d(v) s v t ĥ (v) • h(v): the distance from v to target t • ĥ (v) must be an optimistic estimate of h(v): ĥ (v) ≤ h(v) • Dijkstra is a special case where ĥ (v) = ī (0 for dist.) • now, prioritize the queue by d(v) ⊗ ĥ (v) • can stop when target gets popped -- why? • optimal subpaths should pop earlier than non-optimal • d(v) ⊗ ĥ (v) ≤ d(v) ⊗ h (v) ≤ d(t) ≤ non-optimal paths of t Liang Huang (Penn) Dynamic Programming 32

  57. How to design a heuristic? • more of an art than science • basic idea: projection into coarser space • cluster: w’(U, V) = min { w(u, v) | u ∈ U, v ∈ V } • exact cost in coarser graph is estimate of finer graph 33 (Raphael, 2001) Liang Huang (Penn) Dynamic Programming

  58. How to design a heuristic? • more of an art than science • basic idea: projection into coarser space • cluster: w’(U, V) = min { w(u, v) | u ∈ U, v ∈ V } • exact cost in coarser graph is estimate of finer graph U V U V (Raphael, 2001) Liang Huang (Penn) Dynamic Programming 33

  59. Viterbi or A*? • A* intuition: d(t) ⊗ ĥ (t) ranks higher among d(v) ⊗ ĥ (v) • can finish early if lucky • actually, d(t) ⊗ ĥ (t) = d(t) ⊗ h(t) = d(t) ⊗ ī = d(t) • with the price of maintaining priority queue - O(log V) • Q: how early? worth the price? • if the rank is r, then A* is better when r/V log V < 1 d(v) pool d(v) ⊗ ĥ (v) pool 1 r d(t) V d(t) Liang Huang (Penn) Dynamic Programming 34 Dijkstra A*

  60. Viterbi or A*? • A* intuition: d(t) ⊗ ĥ (t) ranks higher among d(v) ⊗ ĥ (v) • can finish early if lucky • actually, d(t) ⊗ ĥ (t) = d(t) ⊗ h(t) = d(t) ⊗ ī = d(t) • with the price of maintaining priority queue - O(log V) • Q: how early? worth the price? • if the rank is r, then A* is better when r/V log V < 1 d(v) pool d(v) ⊗ ĥ (v) pool 1 r < V / log V r d(t) V d(t) Liang Huang (Penn) Dynamic Programming 34 Dijkstra A*

  61. Two Dimensional Survey traversing order topological best-first (acyclic) (superior) graphs with semirings Viterbi Dijkstra search space (e.g., FSMs) hypergraphs with Generalized weight functions Knuth Viterbi (e.g., CFGs) Liang Huang (Penn) Dynamic Programming 35

  62. Two Dimensional Survey traversing order topological best-first (acyclic) (superior) graphs with semirings Viterbi Dijkstra search space (e.g., FSMs) hypergraphs with Generalized weight functions Knuth Viterbi (e.g., CFGs) Liang Huang (Penn) Dynamic Programming 35

  63. Background: CFG and Parsing (S, 0, n) w 0 w 1 ... w n-1 Liang Huang (Penn) Dynamic Programming 36

  64. Background: CFG and Parsing (S, 0, n) w 0 w 1 ... w n-1 Liang Huang (Penn) Dynamic Programming 36

  65. Background: CFG and Parsing (S, 0, n) w 0 w 1 ... w n-1 Liang Huang (Penn) Dynamic Programming 37

  66. Background: CFG and Parsing (S, 0, n) w 0 w 1 ... w n-1 Liang Huang (Penn) Dynamic Programming 37

  67. (Directed) Hypergraphs • a generalization of graphs • edge => hyperedge: several vertices to one vertex • e = (T(e), h(e), f e ). arity |e| = |T(e)| Y i,j e • a totally-ordered weight set R X i,k Z j,k • we borrow the ⊕ operator to be the comparison • weight function f e : R |e| to R • generalizes the ⊗ operator in semirings simple case: f e (a, b) = a ⊗ b ⊗ w(e) f e u 1 tails v d ( v ) ⊕ = f e ( d ( u 1 ) , d ( u 2 )) u 2 head Liang Huang (Penn) Dynamic Programming 38

  68. Hypergraphs and Deduction (B, i, k) (C, k, j) (B, i, k) (C, k, j) : a : b A → B C u 1 u 2 (A, i, j) f e : a × b × Pr(A → B C) v (A, i, j) (Nederhof, 2003) Liang Huang (Penn) Dynamic Programming 39

  69. Hypergraphs and Deduction (B, i, k) (C, k, j) (B, i, k) (C, k, j) : a : b A → B C u 1 u 2 (A, i, j) f e : a × b × Pr(A → B C) v (A, i, j) (Nederhof, 2003) : a : b : a : b tails u 1 u 2 antecedents u 1 u 2 f e f e : f e ( a,b ) v : f e ( a,b ) v head consequent Liang Huang (Penn) Dynamic Programming 39

  70. Related Formalisms v v OR-node e AND-node e OR-nodes u 1 u 2 u 1 u 2 Liang Huang (Penn) Dynamic Programming 40

  71. Packed Forests • a compact representation of many parses • by sharing common sub-derivations • polynomial-space encoding of exponentially large set 0 I 1 saw 2 him 3 with 4 a 5 mirror 6 (Klein and Manning, 2001; Huang and Chiang, 2005) Liang Huang (Penn) Dynamic Programming 41

  72. Packed Forests • a compact representation of many parses • by sharing common sub-derivations • polynomial-space encoding of exponentially large set nodes hyperedges a hypergraph 0 I 1 saw 2 him 3 with 4 a 5 mirror 6 (Klein and Manning, 2001; Huang and Chiang, 2005) Liang Huang (Penn) Dynamic Programming 41

  73. Weight Functions and Semirings u 1 f e (a 1 , ..., a k ) f e tails v u 2 head ... u k Liang Huang (Penn) Dynamic Programming 42

Recommend


More recommend