Advanced Dynamic Programming in CL: Theory, Algorithms, and - PowerPoint PPT Presentation

Monotonicity • monotonicity Let K = ( A, ⊕ , ⊗ , 0 , 1) be a semiring, and ≤ a partial ordering over A . We say K is monotonic if for all a, b, c ∈ A ( a ≤ b ) ⇒ ( a ⊗ c ≤ b ⊗ c ) ( a ≤ b ) ⇒ ( c ⊗ a ≤ c ⊗ b ) • optimal substructure in dynamic programming A: b’ ⊗ c A: b ⊗ c ≤ b ⊗ c C: c C: c B: b B: b ’ ≤ b • idempotent => monotone (from distributivity) • (a+b) ⊗ c = (a ⊗ c)+(b ⊗ c); if a ≤ b, (a ⊗ c)=(a ⊗ c)+(b ⊗ c) Liang Huang (Penn) Dynamic Programming 11

Monotonicity • monotonicity Let K = ( A, ⊕ , ⊗ , 0 , 1) be a semiring, and ≤ a partial ordering over A . We say K is monotonic if for all a, b, c ∈ A ( a ≤ b ) ⇒ ( a ⊗ c ≤ b ⊗ c ) ( a ≤ b ) ⇒ ( c ⊗ a ≤ c ⊗ b ) • optimal substructure in dynamic programming A: b’ ⊗ c A: b ⊗ c ≤ b ⊗ c C: c C: c B: b B: b ’ ≤ b • idempotent => monotone (from distributivity) • (a+b) ⊗ c = (a ⊗ c)+(b ⊗ c); if a ≤ b, (a ⊗ c)=(a ⊗ c)+(b ⊗ c) • by def. of comparison, a ⊗ c ≤ b ⊗ c Liang Huang (Penn) Dynamic Programming 11

DP on Graphs • optimization problems on graphs => generic shortest-path problem • weighted directed graph G=(V, E) with a function w that assigns each edge a weight from a semiring • compute the best weight of the target vertex t • generic update along edge (u, v) w (u, v) d ( v ) ⊕ = d ( u ) ⊗ w ( u, v ) u v • how to avoid cyclic updates? d ( v ) ← d ( v ) ⊕ ( d ( u ) ⊗ w ( u, v )) • only update when d(u) is fixed Liang Huang (Penn) Dynamic Programming 12

Two Dimensional Survey traversing order topological best-first (acyclic) (superior) graphs with semirings Viterbi Dijkstra search space (e.g., FSMs) hypergraphs with Generalized weight functions Knuth Viterbi (e.g., CFGs) Liang Huang (Penn) Dynamic Programming 13

Viterbi Algorithm for DAGs 1. topological sort 2. visit each vertex v in sorted order and do updates • for each incoming edge (u, v) in E • use d(u) to update d(v): d ( v ) ⊕ = d ( u ) ⊗ w ( u, v ) • key observation: d(u) is fixed to optimal at this time w (u, v) u v • time complexity: O( V + E ) Liang Huang (Penn) Dynamic Programming 14

Variant 1: forward-update 1. topological sort 2. visit each vertex v in sorted order and do updates • for each outgoing edge (v, u) in E • use d(v) to update d(u): d ( u ) ⊕ = d ( v ) ⊗ w ( v, u ) • key observation: d(v) is fixed to optimal at this time w (v, u) u v • time complexity: O( V + E ) Liang Huang (Penn) Dynamic Programming 15

Examples Liang Huang (Penn) Dynamic Programming 16

Examples • [Number of Paths in a DAG] Liang Huang (Penn) Dynamic Programming 16

Examples • [Number of Paths in a DAG] • just use the counting semiring (N, +, × , 0, 1) • note: this is not an optimization problem! Liang Huang (Penn) Dynamic Programming 16

Examples • [Number of Paths in a DAG] • just use the counting semiring (N, +, × , 0, 1) • note: this is not an optimization problem! • [Longest Path in a DAG] Liang Huang (Penn) Dynamic Programming 16

Examples • [Number of Paths in a DAG] • just use the counting semiring (N, +, × , 0, 1) • note: this is not an optimization problem! • [Longest Path in a DAG] • just use the semiring ( R ∪ {−∞} , max , + , −∞ , 0) Liang Huang (Penn) Dynamic Programming 16

Examples • [Number of Paths in a DAG] • just use the counting semiring (N, +, × , 0, 1) • note: this is not an optimization problem! • [Longest Path in a DAG] • just use the semiring ( R ∪ {−∞} , max , + , −∞ , 0) • [Part-of-Speech Tagging with a Hidden Markov Model] Liang Huang (Penn) Dynamic Programming 16

Example: Speech Alignment time complexity: O(n 2 ) also used in: edit distance biological sequence alignment Liang Huang (Penn) Dynamic Programming 17

Example: Word Alignment • key difference • reorderings in translation! I love you . • sequence/speech alignment k Je is always monotonic • complexity under HMM t’ • word alignment is O( n 3 ) j aime • for every ( i, j ) . • enumerate all ( i -1, k ) i- 1 i • sequence alignment O( n 2 ) Liang Huang (Penn) Dynamic Programming 18

Chinese Word Segmentation 下雨天地面积水 xia yu tian di mian ji shui Liang Huang (Penn) Dynamic Programming 19

Chinese Word Segmentation 民主 min-zhu people-dominate “democracy” 下雨天地面积水 xia yu tian di mian ji shui Liang Huang (Penn) Dynamic Programming 19

Chinese Word Segmentation 江泽民主席民主 min-zhu jiang-ze-min zhu-xi people-dominate ... - ... - people dominate-podium “democracy” “President Jiang Zemin” 下雨天地面积水 xia yu tian di mian ji shui Liang Huang (Penn) Dynamic Programming 19

Chinese Word Segmentation 江泽民主席民主 min-zhu jiang-ze-min zhu-xi people-dominate ... - ... - people dominate-podium this was 5 years ago. “democracy” “President Jiang Zemin” now Google is good at segmentation! 下雨天地面积水 xia yu tian di mian ji shui Liang Huang (Penn) Dynamic Programming 19

Chinese Word Segmentation 江泽民主席民主 min-zhu jiang-ze-min zhu-xi people-dominate ... - ... - people dominate-podium this was 5 years ago. “democracy” “President Jiang Zemin” now Google is good at segmentation! 下雨天地面积水 xia yu tian di mian ji shui graph search Liang Huang (Penn) Dynamic Programming 19

Phrase-based Decoding 与沙龙举行了会谈 yu Shalong juxing le huitan held a talk with Sharon Sharon held talks with with Sharon held a talk yu Shalong juxing le huitan Huang and Chiang Forest Rescoring 20

Phrase-based Decoding 与沙龙举行了会谈 yu Shalong juxing le huitan held a talk with Sharon _ _ _ _ _ Sharon held talks with with Sharon held a talk yu Shalong juxing le huitan Huang and Chiang Forest Rescoring 20

Phrase-based Decoding 与沙龙举行了会谈 yu Shalong juxing le huitan held a talk with Sharon _ _ ● _ _ _ _ _ ● ● Sharon held talks with with Sharon held a talk yu Shalong juxing le huitan Huang and Chiang Forest Rescoring 20

Phrase-based Decoding 与沙龙举行了会谈 yu Shalong juxing le huitan held a talk with Sharon _ _ ● ● ● ● ● ● _ _ _ _ _ ● ● Sharon held talks with with Sharon held a talk yu Shalong juxing le huitan Huang and Chiang Forest Rescoring 20

Phrase-based Decoding 与沙龙举行了会谈 yu Shalong juxing le huitan held a talk with Sharon _ _ ● ● ● ● ● ● _ _ _ _ _ ● ● Sharon held talks with with Sharon held a talk yu Shalong juxing le huitan Huang and Chiang Forest Rescoring 21

Phrase-based Decoding 与沙龙举行了会谈 yu Shalong juxing le huitan held a talk with Sharon _ ● ● ● ● _ _ ● ● ● ● ● ● _ _ _ _ _ ● ● Sharon held talks with with Sharon held a talk yu Shalong juxing le huitan Huang and Chiang Forest Rescoring 21

Phrase-based Decoding source-side: coverage vector 与沙龙举行了会谈 _ _ ● ● ● yu Shalong juxing le huitan held a talk target-side: grow hypotheses held a talk with Sharon strictly left-to-right ... ... _ _ ● ● ● ● ● ● _ _ _ _ _ ● ● held a talk held a talk with Sharon ... ... space: O(2 n ), time: O(2 n n 2 ) -- cf. traveling salesman problem Huang and Chiang Forest Rescoring 22

Traveling Salesman Problem & MT • a classical NP-hard problem • goal: visit each city once and only once • exponential-time dynamic programming • state: cities visited so far (bit-vector) • search in this O(2 n ) transformed graph • MT: each city is a source-language word • restrictions in reordering can reduce complexity => distortion limit • => syntax-based MT (Held and Karp, 1962; Knight, 1999) Huang and Chiang Forest Rescoring 23

Adding a Bigram Model • “refined” graph: annotated with language model words • still dynamic programming, just larger search space ●●●●● ... Shalong ... meeting _ _ ● ● ● ●●●●● ... Sharon ... talk _ _ _ _ _ _ _ ● ● ● ... talks _ _ ● ● ● Huang and Chiang Forest Rescoring 24

Adding a Bigram Model • “refined” graph: annotated with language model words • still dynamic programming, just larger search space ●●●●● ... Shalong ... meeting _ _ ● ● ● with Sharon ●●●●● ... Sharon ... talk _ _ _ _ _ _ _ ● ● ● ... talks _ _ ● ● ● Huang and Chiang Forest Rescoring 24

Adding a Bigram Model • “refined” graph: annotated with language model words • still dynamic programming, just larger search space ●●●●● ... Shalong ... meeting _ _ ● ● ● with Sharon ●●●●● ... Sharon ... talk _ _ _ _ _ _ _ ● ● ● bigram ... talks _ _ ● ● ● Huang and Chiang Forest Rescoring 24

Adding a Bigram Model • “refined” graph: annotated with language model words • still dynamic programming, just larger search space ●●●●● ... Shalong ... meeting _ _ ● ● ● with Sharon ●●●●● ... Sharon ... talk _ _ _ _ _ _ _ ● ● ● bigram ... talks _ _ ● ● ● space: O(2 n ), time: O(2 n n 2 ) => space: O(2 n V m- 1 ), time: O(2 n V m- 1 n 2 ) for m- gram language models Huang and Chiang Forest Rescoring 24

Dijkstra Algorithm d(u) d(u) ⊗ w (e) w (e) Liang Huang (Penn) Dynamic Programming 26

Dijkstra Algorithm • Dijkstra does not require acyclicity • instead of topological order, we use best-first order • but this requires superiority of the semiring Let K = ( A, ⊕ , ⊗ , 0 , 1) be a semiring, and ≤ a partial ordering over A . We say K is superior if for all a, b ∈ A a ≤ a ⊗ b, b ≤ a ⊗ b. • intuition: combination always gets worse d(u) d(u) ⊗ w (e) w (e) Liang Huang (Penn) Dynamic Programming 26

Dijkstra Algorithm • Dijkstra does not require acyclicity • instead of topological order, we use best-first order • but this requires superiority of the semiring Let K = ( A, ⊕ , ⊗ , 0 , 1) be a semiring, and ≤ a partial ordering over A . We say K is superior if for all a, b ∈ A a ≤ a ⊗ b, b ≤ a ⊗ b. • intuition: combination always gets worse • contrast: monotonicity: combination preserves order d(u) d(u) ⊗ w (e) w (e) Liang Huang (Penn) Dynamic Programming 26

Dijkstra Algorithm • Dijkstra does not require acyclicity • instead of topological order, we use best-first order • but this requires superiority of the semiring Let K = ( A, ⊕ , ⊗ , 0 , 1) be a semiring, and ≤ a partial ordering over A . We say K is superior if for all a, b ∈ A a ≤ a ⊗ b, b ≤ a ⊗ b. • intuition: combination always gets worse • contrast: monotonicity: combination preserves order ( { 0 , 1 } , ∨ , ∧ , 0 , 1) ([0 , 1] , max , × , 0 , 1) ( R + ∪ { + ∞} , min , + , + ∞ , 0) d(u) d(u) ⊗ w (e) w (e) ( R ∪ { + ∞} , min , + , + ∞ , 0) Liang Huang (Penn) Dynamic Programming 26

Dijkstra Algorithm • keep a cut (S : V - S) where S vertices are fixed • maintain a priority queue Q of V - S vertices • each iteration choose the best vertex v from Q • move v to S, and use d(v) to forward-update others d ( u ) ⊕ = d ( v ) ⊗ w ( v, u ) ... s v time complexity: O((V+E) lgV) (binary heap) S V - S O(V lgV + E) (fib. heap) Liang Huang (Penn) Dynamic Programming 27

Dijkstra Algorithm • keep a cut (S : V - S) where S vertices are fixed • maintain a priority queue Q of V - S vertices • each iteration choose the best vertex v from Q • move v to S, and use d(v) to forward-update others w (v, u) d ( u ) ⊕ = d ( v ) ⊗ w ( v, u ) u ... s v time complexity: O((V+E) lgV) (binary heap) S V - S O(V lgV + E) (fib. heap) Liang Huang (Penn) Dynamic Programming 27

Viterbi vs. Dijkstra • structural vs. algebraic constraints • Dijkstra only applicable to optimization problems monotonic optimization problems Liang Huang (Penn) Dynamic Programming 28

Viterbi vs. Dijkstra • structural vs. algebraic constraints • Dijkstra only applicable to optimization problems monotonic optimization problems acyclic: Viterbi Liang Huang (Penn) Dynamic Programming 28

Viterbi vs. Dijkstra • structural vs. algebraic constraints • Dijkstra only applicable to optimization problems monotonic optimization problems acyclic: superior: Viterbi Dijkstra Liang Huang (Penn) Dynamic Programming 28

Viterbi vs. Dijkstra • structural vs. algebraic constraints • Dijkstra only applicable to optimization problems monotonic optimization problems many acyclic: superior: NLP Viterbi Dijkstra problems Liang Huang (Penn) Dynamic Programming 28

Viterbi vs. Dijkstra • structural vs. algebraic constraints • Dijkstra only applicable to optimization problems monotonic optimization problems many acyclic: superior: NLP Viterbi Dijkstra problems forward-backward (Inside semiring) Liang Huang (Penn) Dynamic Programming 28

Viterbi vs. Dijkstra • structural vs. algebraic constraints • Dijkstra only applicable to optimization problems monotonic optimization problems many acyclic: superior: NLP Viterbi Dijkstra problems forward-backward non-probabilistic (Inside semiring) models Liang Huang (Penn) Dynamic Programming 28

Viterbi vs. Dijkstra • structural vs. algebraic constraints • Dijkstra only applicable to optimization problems monotonic optimization problems many acyclic: superior: NLP Viterbi Dijkstra problems forward-backward cyclic FSMs/ non-probabilistic (Inside semiring) grammars models Liang Huang (Penn) Dynamic Programming 28

What if both fail? monotonic optimization problems many acyclic: superior: NLP Viterbi Dijkstra problems generalized Bellman-Ford (CLR, 1990; Mohri, 2002) or, first do strongly-connected components (SCC) which gives a DAG; use Viterbi globally on this SCC-DAG; use Bellman-Ford locally within each SCC Liang Huang (Penn) Dynamic Programming 29

What if both work? monotonic optimization problems many acyclic: superior: NLP Viterbi Dijkstra problems full Dijkstra is slower than Viterbi O((V + E) lgV) vs. O(V + E) but it can finish as early as the target vertex is popped a (V + E) lgV vs. V + E Q : how to (magically) reduce a ? Liang Huang (Penn) Dynamic Programming 30

A* Search: Intuition • Dijkstra is “blind” about how far the target is • may get “trapped” by obstacles • can we be more intelligent about the future? • idea: prioritize by s-v distance + v-t estimate v s t u Liang Huang (Penn) Dynamic Programming 31

A* Heuristic h(v) d(v) s v t ĥ (v) • h(v): the distance from v to target t • ĥ (v) must be an optimistic estimate of h(v): ĥ (v) ≤ h(v) • Dijkstra is a special case where ĥ (v) = ī (0 for dist.) • now, prioritize the queue by d(v) ⊗ ĥ (v) • can stop when target gets popped -- why? • optimal subpaths should pop earlier than non-optimal • d(v) ⊗ ĥ (v) ≤ d(v) ⊗ h (v) ≤ d(t) ≤ non-optimal paths of t Liang Huang (Penn) Dynamic Programming 32

How to design a heuristic? • more of an art than science • basic idea: projection into coarser space • cluster: w’(U, V) = min { w(u, v) | u ∈ U, v ∈ V } • exact cost in coarser graph is estimate of finer graph 33 (Raphael, 2001) Liang Huang (Penn) Dynamic Programming

How to design a heuristic? • more of an art than science • basic idea: projection into coarser space • cluster: w’(U, V) = min { w(u, v) | u ∈ U, v ∈ V } • exact cost in coarser graph is estimate of finer graph U V U V (Raphael, 2001) Liang Huang (Penn) Dynamic Programming 33

Viterbi or A*? • A* intuition: d(t) ⊗ ĥ (t) ranks higher among d(v) ⊗ ĥ (v) • can finish early if lucky • actually, d(t) ⊗ ĥ (t) = d(t) ⊗ h(t) = d(t) ⊗ ī = d(t) • with the price of maintaining priority queue - O(log V) • Q: how early? worth the price? • if the rank is r, then A* is better when r/V log V < 1 d(v) pool d(v) ⊗ ĥ (v) pool 1 r d(t) V d(t) Liang Huang (Penn) Dynamic Programming 34 Dijkstra A*

Viterbi or A*? • A* intuition: d(t) ⊗ ĥ (t) ranks higher among d(v) ⊗ ĥ (v) • can finish early if lucky • actually, d(t) ⊗ ĥ (t) = d(t) ⊗ h(t) = d(t) ⊗ ī = d(t) • with the price of maintaining priority queue - O(log V) • Q: how early? worth the price? • if the rank is r, then A* is better when r/V log V < 1 d(v) pool d(v) ⊗ ĥ (v) pool 1 r < V / log V r d(t) V d(t) Liang Huang (Penn) Dynamic Programming 34 Dijkstra A*

Background: CFG and Parsing (S, 0, n) w 0 w 1 ... w n-1 Liang Huang (Penn) Dynamic Programming 36

Background: CFG and Parsing (S, 0, n) w 0 w 1 ... w n-1 Liang Huang (Penn) Dynamic Programming 37

(Directed) Hypergraphs • a generalization of graphs • edge => hyperedge: several vertices to one vertex • e = (T(e), h(e), f e ). arity |e| = |T(e)| Y i,j e • a totally-ordered weight set R X i,k Z j,k • we borrow the ⊕ operator to be the comparison • weight function f e : R |e| to R • generalizes the ⊗ operator in semirings simple case: f e (a, b) = a ⊗ b ⊗ w(e) f e u 1 tails v d ( v ) ⊕ = f e ( d ( u 1 ) , d ( u 2 )) u 2 head Liang Huang (Penn) Dynamic Programming 38

Hypergraphs and Deduction (B, i, k) (C, k, j) (B, i, k) (C, k, j) : a : b A → B C u 1 u 2 (A, i, j) f e : a × b × Pr(A → B C) v (A, i, j) (Nederhof, 2003) Liang Huang (Penn) Dynamic Programming 39

Hypergraphs and Deduction (B, i, k) (C, k, j) (B, i, k) (C, k, j) : a : b A → B C u 1 u 2 (A, i, j) f e : a × b × Pr(A → B C) v (A, i, j) (Nederhof, 2003) : a : b : a : b tails u 1 u 2 antecedents u 1 u 2 f e f e : f e ( a,b ) v : f e ( a,b ) v head consequent Liang Huang (Penn) Dynamic Programming 39

Related Formalisms v v OR-node e AND-node e OR-nodes u 1 u 2 u 1 u 2 Liang Huang (Penn) Dynamic Programming 40

Packed Forests • a compact representation of many parses • by sharing common sub-derivations • polynomial-space encoding of exponentially large set 0 I 1 saw 2 him 3 with 4 a 5 mirror 6 (Klein and Manning, 2001; Huang and Chiang, 2005) Liang Huang (Penn) Dynamic Programming 41

Packed Forests • a compact representation of many parses • by sharing common sub-derivations • polynomial-space encoding of exponentially large set nodes hyperedges a hypergraph 0 I 1 saw 2 him 3 with 4 a 5 mirror 6 (Klein and Manning, 2001; Huang and Chiang, 2005) Liang Huang (Penn) Dynamic Programming 41

Weight Functions and Semirings u 1 f e (a 1 , ..., a k ) f e tails v u 2 head ... u k Liang Huang (Penn) Dynamic Programming 42

Advanced Dynamic Programming in CL: Theory, Algorithms, and - PowerPoint PPT Presentation

Advanced Dynamic Programming in CL: Theory, Algorithms, and Applications (S, 0, n) w 0 w 1 ... w n-1 Liang Huang University of Pennsylvania A Little Bit of History... Liang Huang (Penn) Dynamic Programming 2 A Little Bit of History...

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

CS 170 Section 6 Dynamic Programming Owen Jow | owenjow@berkeley.edu Agenda Dynamic

Dynamic Programming Kevin Zatloukal July 18, 2011 Motivation Dynamic programming deserves

CS 310 Advanced Data Structures and Algorithms Dynamic Programming July 5, 2018 Mohammad

Dynamic programming 1 Dynamic programming also solve a problem by combining the solutions to

Dynamic Programming December 15, 2016 CMPE 250 Dynamic Programming December 15, 2016 1 / 60

Dynamic Programming Dynamic Programming Steps. 9 View the problem solution as the result of a

COMMUNICATING [with empathy] @ DY DYNAMIC JILL JILL @ DY DYNAMIC JILL TENSION IS INEVITABLE @

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Dynamic Programming (Chapter 6) Algorithm Design Techniques Greedy Divide and Conquer Dynamic

Lecture 18: Elements of Dynamic Programming COMS10007 - Algorithms Dr. Christian Konrad

Open, extensible dynamic programming systems or just how deep is the dynamic rabbit hole?

17: Dynamic Programming CS1101S: Programming Methodology Martin Henz October 19, 2012 CS1101S:

Dynamic Programming Has nothing to do with programming in the way we normally use that term

Merge Sort 5/6/2003 1:27 PM Outline and Reading The General Technique (5.3.2) Dynamic

Chapter 25: All-Pairs Shortest Path A trivial solution is to use SSSP algorithms for APSP With

Israel: Israel: Past, Present, and Past, Present, and Future Future Ezekiel 5:5 Thus says

MODELING, OPTIMIZATION, CONTROL AND DESIGN: SOME THOUGHTS SANTOSH K. GUPTA DEPARTMENT OF

Modal logic Benzm uller/Rojas, 2014 Artificial Intelligence 2 What is Modal Logic?

Dynami mic Programmi mming Jeevani Goone*llake University of Colombo

High Speed Railway in China Xing Zeng Oct 6, 2017 Disclaimers I am a citizen of China

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 Shortest Paths Dijkstras Algorithm

Minimum Spanning Trees Chapter 23 1 CPTR 430 Algorithms Minimum Spanning Trees Motivation