compiling comp ling
play

Compiling Comp Ling Practical weighted dynamic programming and the - PDF document

An Anecdote from ACL05 Compiling Comp Ling Practical weighted dynamic programming and the Dyna language -Michael Jordan Jason Eisner Eric Goldlust Noah A. Smith HLT-EMNLP, October 2005 1 2 An Anecdote from ACL05 Conclusions to draw


  1. An Anecdote from ACL’05 Compiling Comp Ling Practical weighted dynamic programming and the Dyna language -Michael Jordan Jason Eisner Eric Goldlust Noah A. Smith HLT-EMNLP, October 2005 1 2 An Anecdote from ACL’05 Conclusions to draw from that talk 1. Mike & his students are great. 2. Graphical models are great. -Michael Jordan (because t hey’re f lexible) 3. Gibbs sampling is great. (because it works wit h nearly any graphical model) Just draw a model that actually makes sense for your problem. 4. Matlab is great. (because it f rees up Mike and his st udent s t o doodle all day and t hen execut e t heir doodles) Just do Gibbs sampling. Um, it’s only 6 lines in Matlab… 3 4 Parts of it already are … 1. Mike & his students are great. Language modeling Toolkit s 2. Graphical models are great. available; you Binary classification (e.g., SVMs) (because t hey’re f lexible) don’t have t o be Finite-state transductions 3. Gibbs sampling is great. an expert Linear-chain graphical models (because it works wit h nearly any graphical model) 4. Matlab is great. But other parts aren’t … (because it f rees up Mike and his st udent s t o Ef f icient doodle all day and t hen execut e t heir doodles) Context-free and beyond parsers and MT Machine translation syst ems are complicat ed and painf ul t o writ e 5 6 1

  2. Warning This talk: A toolkit that’s general enough for � This talk is only an advertisement! these cases. � For more details, please (stretches from finite-state to Turing machines) see the paper “Dyna” see http://dyna.org But other parts aren’t … (download + documentation) Ef f icient Context-free and beyond parsers and MT Machine translation syst ems are sign up for updates by email complicat ed and painf ul t o writ e 7 8 How you build a system (“big picture” slide) Wait a minute … cool model Didn’t I just implement something pr act ical equat ions PCFG like this last month? β β ( i , j ) ( j , k ) ( ) β = y z ( ) i , k 0 ∑ → x p N N N | N ≤ < < ≤ i j k n x y z x ... chart management / indexing cache-conscious data structures prioritize partial solutions (best-first, pruning) pseudocode (execut ion or der ) parameter management t uned C++ for width from 2 to n inside-outside formulas implement at ion for i from 0 to n-width different algorithms for training and decoding (dat a st r uct ur es, et c.) k = i+width conjugate gradient, annealing, ... for j from i+1 to k-1 … parallelization? 9 10 We thought computers were supposed to automate drudgery How you build a system (“big picture” slide) How you build a system (“big picture” slide) cool model cool model pr act ical equat ions pr act ical equat ions PCFG PCFG β β β β ( i , j ) ( j , k ) ( i , j ) ( j , k ) ( ) ( ) β = y z β = y z , ( ) , ( ) i k i k 0 ∑ → 0 ∑ → x x p N N N | N p N N N | N ≤ < < ≤ ≤ < < ≤ i j k n x y z x i j k n x y z x ... ... Dyna language specif ies t hese equat ions. Compilat ion st rat egies (we’ll come back t o t his) Most pr ogr ams j ust need t o comput e some pseudocode pseudocode values f r om ot her values. Any or der is ok. (execut ion or der ) (execut ion or der ) t uned C++ t uned C++ for width from 2 to n for width from 2 to n implement at ion implement at ion Some progr ams also need t o updat e t he for i from 0 to n-width for i from 0 to n-width (dat a st r uct ur es, et c.) (dat a st r uct ur es, et c.) out put s if t he input s change: k = i+width k = i+width � spr eadsheet s, makef iles, email r eader s for j from i+1 to k-1 for j from i+1 to k-1 � dynamic gr aph algor it hms … … � EM and ot her it er at ive opt imizat ion � leave-one-out t r aining of smoot hing par ams 11 12 2

  3. Writing equations in Dyna More interesting use of patterns � a = b * c. � int a. spar se dot pr oduct of quer y & document � a = b * c. � scalar multiplication ... + b(“yetis”)*c(“yetis”) a will be kept up to date if b or c changes. � a(I) = b(I) * c(I). + b(“zebra”)*c(“zebra”) � b += x. � pointwise multiplication b += y. equivalent to b = x+y. � a += b(I) * c(I). means a = b(I)*c(I) ∑ b is a sum of two variables. Also kept up to date. I � dot product; could be sparse � c += z(1). a “pat t er n” c += z(2). c += z(N). t he capit alized N c += z(3). � a(I,K) += b(I,J) * c(J,K). ∑ b(I,J)*c(J,K) mat ches anyt hing J c += z(“four”). � matrix multiplication; could be sparse c is a sum of all c += z(foo(bar,5)). nonzero z(…) values. � J is free on the right-hand side, so we sum over it At compile time, we don’t know how many! 13 14 Dyna vs. Prolog The CKY inside algorithm in Dyna :- double item = 0. By now you may see what we’re up to! :- bool length = false. Prolog has Horn clauses: constit(X,I,J) += word(W,I,J) * rewrite(X,W). constit(X,I,J) += constit(Y,I,Mid) * constit(Z,Mid,J) * rewrite(X,Y,Z). a(I,K) :- b(I,J) , c(J,K). goal += constit(“s”,0,N) if length(N). Dyna has “Horn equations”: using namespace cky; a(I,K) += b(I,J) * c(J,K). chart c; def init ion f rom ot her values has a value put in axioms c[rewrite(“s”,”np”,”vp”)] = 0.7; (values not e.g., a real number c[word(“Pierre”,0,1)] = 1; def ined by c[length(30)] = true; // 30-word sentence t he above Like Prolog: Unlike Prolog: cin >> c; // get more axioms from stdin pr ogr am) Allow nest ed t er ms Char t s, not backt r acking! t heor em Synt act ic sugar f or list s, et c. Compile � ef f icient C++ classes cout << c[goal]; // print total weight of all parses pops out Tur ing-complet e I nt egr at es wit h your C++ code 15 16 Related algorithms in Dyna? visual debugger – browse the proof forest constit(X,I,J) += word(W,I,J) * rewrite(X,W). ambiguity constit(X,I,J) += constit(Y,I,Mid) * constit(Z,Mid,J) * rewrite(X,Y,Z). goal += constit(“s”,0,N) if length(N). � Viterbi parsing? shared substructure � Logarithmic domain? � Lattice parsing? � Earley’s algorithm? � Binarized CKY? � Incremental (left-to-right) parsing? � Log-linear parsing? � Lexicalized or synchronous parsing? 17 18 3

  4. Related algorithms in Dyna? Related algorithms in Dyna? constit(X,I,J) max= += word(W,I,J) * rewrite(X,W). constit(X,I,J) log+= max= += word(W,I,J) + * rewrite(X,W). constit(X,I,J) max= += constit(Y,I,Mid) * constit(Z,Mid,J) * rewrite(X,Y,Z). constit(X,I,J) max= log+= += constit(Y,I,Mid) * constit(Z,Mid,J) * rewrite(X,Y,Z). + + goal += constit(“s”,0,N) if length(N). goal += constit(“s”,0,N) if length(N). max= log+= max= � Viterbi parsing? � Viterbi parsing? � Logarithmic domain? � Logarithmic domain? � Lattice parsing? � Lattice parsing? � Earley’s algorithm? � Earley’s algorithm? � Binarized CKY? � Binarized CKY? � Incremental (left-to-right) parsing? � Incremental (left-to-right) parsing? � Log-linear parsing? � Log-linear parsing? � Lexicalized or synchronous parsing? � Lexicalized or synchronous parsing? 19 20 Related algorithms in Dyna? Related algorithms in Dyna? constit(X,I,J) += word(W,I,J) * rewrite(X,W). constit(X,I,J) += word(W,I,J) * rewrite(X,W). constit(X,I,J) += constit(Y,I,Mid) * constit(Z,Mid,J) * rewrite(X,Y,Z). constit(X,I,J) += constit(Y,I,Mid) * constit(Z,Mid,J) * rewrite(X,Y,Z). goal += constit(“s”,0,N) if length(N). goal += constit(“s”,0,N) if length(N). � Viterbi parsing? � Viterbi parsing? � Logarithmic domain? � Logarithmic domain? � Lattice parsing? � Lattice parsing? c[ word(“Pierre”, 0, 1) ] = 1 state(5) state(9) 0.2 � Earley’s algorithm? � Earley’s algorithm? air/0.3 � Binarized CKY? � Binarized CKY? P/0.5 8 9 2 0 . � Incremental (left-to-right) parsing? / � Incremental (left-to-right) parsing? e r e r i P � Log-linear parsing? � Log-linear parsing? 5 � Lexicalized or synchronous parsing? � Lexicalized or synchronous parsing? 21 22 Earley’s algorithm in Dyna Program transformations cool model constit(X,I,J) += word(W,I,J) * rewrite(X,W). pr act ical equat ions PCFG constit(X,I,J) += constit(Y,I,Mid) * constit(Z,Mid,J) * rewrite(X,Y,Z). β β ( i , j ) ( j , k ) ( ) β = y z , ( ) goal += constit(“s”,0,N) if length(N). i k 0 ∑ → x p N N N | N ≤ < < ≤ i j k n x y z x magic templates transformation ... Lots of equivalent ways to write (as noted by Minnen 1996) need(“s”,0) = true. a syst em of equat ions! need(Nonterm,J) |= ?constit(_/[Nonterm|_],_,J). Transf orming f rom one to another may pseudocode constit(Nonterm/Needed,I,I) improve ef f iciency. (execut ion or der ) += rewrite(Nonterm,Needed) if need(Nonterm,I). t uned C++ for width from 2 to n constit(Nonterm/Needed,I,K) implement at ion (Or, transf orm to related equations that compute for i from 0 to n-width += constit(Nonterm/[W|Needed],I,J) * word(W,J,K). (dat a st r uct ur es, et c.) gradients, upper bounds, etc. ) k = i+width constit(Nonterm/Needed,I,K) for j from i+1 to k-1 … Many parsing “tricks” can be generalized into += constit(Nonterm/[X|Needed],I,J) * constit(X/[],J,K). automatic transf ormations that help other programs, too! goal += constit(“s”/[],0,N) if length(N). 23 24 4

Recommend


More recommend