from baby steps to leapfrog how less is more
play

From Baby Steps to Leapfrog: How Less is More in Unsupervised - PowerPoint PPT Presentation

From Baby Steps to Leapfrog: How Less is More in Unsupervised Dependency Parsing Valentin I. Spitkovsky with Hiyan Alshawi (Google Inc.) and Daniel Jurafsky (Stanford University) Spitkovsky et al. (Stanford & Google) From Baby Steps


  1. The Problem Motivation: Unsupervised (Dependency) Parsing Insert your favorite reason(s) why you’d like to parse anything in the first place... ... adjust for any data without reference tree banks: — i.e., exotic languages and/or genres (e.g., legal). Potential applications: ◮ machine translation — word alignment, phrase extraction, reordering; ◮ web search — retrieval, query refinement; ◮ question answering Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30

  2. The Problem Motivation: Unsupervised (Dependency) Parsing Insert your favorite reason(s) why you’d like to parse anything in the first place... ... adjust for any data without reference tree banks: — i.e., exotic languages and/or genres (e.g., legal). Potential applications: ◮ machine translation — word alignment, phrase extraction, reordering; ◮ web search — retrieval, query refinement; ◮ question answering, speech recognition, etc. Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 5 / 30

  3. State-of-the-Art State-of-the-Art: Directed Dependency Accuracy Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 6 / 30

  4. State-of-the-Art State-of-the-Art: Directed Dependency Accuracy 42.2% on Section 23 (all sentences) of WSJ (Cohen and Smith, 2009) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 6 / 30

  5. State-of-the-Art State-of-the-Art: Directed Dependency Accuracy 42.2% on Section 23 (all sentences) of WSJ (Cohen and Smith, 2009) 31.7% for the (right-branching) baseline (Klein and Manning, 2004) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 6 / 30

  6. State-of-the-Art State-of-the-Art: Directed Dependency Accuracy 42.2% on Section 23 (all sentences) of WSJ (Cohen and Smith, 2009) 31.7% for the (right-branching) baseline (Klein and Manning, 2004) Scoring example: ♦ NN NNS VBD IN NN | | | | | | Factory payrolls fell in September . 3 2 Directed Score: 5 = 60% (baseline: 5 = 40% ); 4 4 Undirected Score: 5 = 80% (baseline: 5 = 80% ). Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 6 / 30

  7. State-of-the-Art State-of-the-Art: A Brief History Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

  8. State-of-the-Art State-of-the-Art: A Brief History 1992 — word classes (Carroll and Charniak) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

  9. State-of-the-Art State-of-the-Art: A Brief History 1992 — word classes (Carroll and Charniak) 1998 — greedy linkage via mutual information (Yuret) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

  10. State-of-the-Art State-of-the-Art: A Brief History 1992 — word classes (Carroll and Charniak) 1998 — greedy linkage via mutual information (Yuret) 2001 — iterative re-estimation with EM (Paskin) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

  11. State-of-the-Art State-of-the-Art: A Brief History 1992 — word classes (Carroll and Charniak) 1998 — greedy linkage via mutual information (Yuret) 2001 — iterative re-estimation with EM (Paskin) 2004 — right-branching baseline — valence (DMV) (Klein and Manning) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

  12. State-of-the-Art State-of-the-Art: A Brief History 1992 — word classes (Carroll and Charniak) 1998 — greedy linkage via mutual information (Yuret) 2001 — iterative re-estimation with EM (Paskin) 2004 — right-branching baseline — valence (DMV) (Klein and Manning) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

  13. State-of-the-Art State-of-the-Art: A Brief History 1992 — word classes (Carroll and Charniak) 1998 — greedy linkage via mutual information (Yuret) 2001 — iterative re-estimation with EM (Paskin) 2004 — right-branching baseline — valence (DMV) (Klein and Manning) 2004 — annealing techniques (Smith and Eisner) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

  14. State-of-the-Art State-of-the-Art: A Brief History 1992 — word classes (Carroll and Charniak) 1998 — greedy linkage via mutual information (Yuret) 2001 — iterative re-estimation with EM (Paskin) 2004 — right-branching baseline — valence (DMV) (Klein and Manning) 2004 — annealing techniques (Smith and Eisner) 2005 — contrastive estimation (Smith and Eisner) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

  15. State-of-the-Art State-of-the-Art: A Brief History 1992 — word classes (Carroll and Charniak) 1998 — greedy linkage via mutual information (Yuret) 2001 — iterative re-estimation with EM (Paskin) 2004 — right-branching baseline — valence (DMV) (Klein and Manning) 2004 — annealing techniques (Smith and Eisner) 2005 — contrastive estimation (Smith and Eisner) 2006 — structural biasing (Smith and Eisner) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

  16. State-of-the-Art State-of-the-Art: A Brief History 1992 — word classes (Carroll and Charniak) 1998 — greedy linkage via mutual information (Yuret) 2001 — iterative re-estimation with EM (Paskin) 2004 — right-branching baseline — valence (DMV) (Klein and Manning) 2004 — annealing techniques (Smith and Eisner) 2005 — contrastive estimation (Smith and Eisner) 2006 — structural biasing (Smith and Eisner) 2007 — common cover link representation (Seginer) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

  17. State-of-the-Art State-of-the-Art: A Brief History 1992 — word classes (Carroll and Charniak) 1998 — greedy linkage via mutual information (Yuret) 2001 — iterative re-estimation with EM (Paskin) 2004 — right-branching baseline — valence (DMV) (Klein and Manning) 2004 — annealing techniques (Smith and Eisner) 2005 — contrastive estimation (Smith and Eisner) 2006 — structural biasing (Smith and Eisner) 2007 — common cover link representation (Seginer) 2008 — logistic normal priors (Cohen et al.) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

  18. State-of-the-Art State-of-the-Art: A Brief History 1992 — word classes (Carroll and Charniak) 1998 — greedy linkage via mutual information (Yuret) 2001 — iterative re-estimation with EM (Paskin) 2004 — right-branching baseline — valence (DMV) (Klein and Manning) 2004 — annealing techniques (Smith and Eisner) 2005 — contrastive estimation (Smith and Eisner) 2006 — structural biasing (Smith and Eisner) 2007 — common cover link representation (Seginer) 2008 — logistic normal priors (Cohen et al.) 2009 — lexicalization and smoothing (Headden et al.) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

  19. State-of-the-Art State-of-the-Art: A Brief History 1992 — word classes (Carroll and Charniak) 1998 — greedy linkage via mutual information (Yuret) 2001 — iterative re-estimation with EM (Paskin) 2004 — right-branching baseline — valence (DMV) (Klein and Manning) 2004 — annealing techniques (Smith and Eisner) 2005 — contrastive estimation (Smith and Eisner) 2006 — structural biasing (Smith and Eisner) 2007 — common cover link representation (Seginer) 2008 — logistic normal priors (Cohen et al.) 2009 — lexicalization and smoothing (Headden et al.) 2009 — soft parameter tying (Cohen and Smith) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 7 / 30

  20. State-of-the-Art State-of-the-Art: Dependency Model with Valence Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

  21. State-of-the-Art State-of-the-Art: Dependency Model with Valence a head-outward model, with word classes and valence/adjacency (Klein and Manning, 2004) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

  22. State-of-the-Art State-of-the-Art: Dependency Model with Valence a head-outward model, with word classes and valence/adjacency (Klein and Manning, 2004) h Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

  23. State-of-the-Art State-of-the-Art: Dependency Model with Valence a head-outward model, with word classes and valence/adjacency (Klein and Manning, 2004) h Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

  24. State-of-the-Art State-of-the-Art: Dependency Model with Valence a head-outward model, with word classes and valence/adjacency (Klein and Manning, 2004) a 1 h Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

  25. State-of-the-Art State-of-the-Art: Dependency Model with Valence a head-outward model, with word classes and valence/adjacency (Klein and Manning, 2004) a 1 h Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

  26. State-of-the-Art State-of-the-Art: Dependency Model with Valence a head-outward model, with word classes and valence/adjacency (Klein and Manning, 2004) a 1 h Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

  27. State-of-the-Art State-of-the-Art: Dependency Model with Valence a head-outward model, with word classes and valence/adjacency (Klein and Manning, 2004) a 1 a 2 h Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

  28. State-of-the-Art State-of-the-Art: Dependency Model with Valence a head-outward model, with word classes and valence/adjacency (Klein and Manning, 2004) a 1 a 2 h Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

  29. State-of-the-Art State-of-the-Art: Dependency Model with Valence a head-outward model, with word classes and valence/adjacency (Klein and Manning, 2004) a 1 a 2 h STOP Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

  30. State-of-the-Art State-of-the-Art: Dependency Model with Valence a head-outward model, with word classes and valence/adjacency (Klein and Manning, 2004) a 1 a 2 h STOP  adj n � ���� �  P ( t h ) =  P STOP ( c h , dir , 1 n =0 ) P ( t a i ) P ATTACH ( c h , dir , c a i ) dir ∈{ L , R } i =1  adj ����  (1 − P STOP ( c h , dir , 1 i =1 ))  n = | args ( h , dir ) | Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 8 / 30

  31. State-of-the-Art State-of-the-Art: Unsupervised Learning Engine Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 9 / 30

  32. State-of-the-Art State-of-the-Art: Unsupervised Learning Engine EM, via inside-outside re-estimation (Baker, 1979) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 9 / 30

  33. State-of-the-Art State-of-the-Art: Unsupervised Learning Engine EM, via inside-outside re-estimation (Baker, 1979) (Manning and Sch¨ utze, 1999) N 1 α N j β w 1 w p − 1 w p w q w q +1 w m · · · · · · · · · Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 9 / 30

  34. State-of-the-Art State-of-the-Art: Unsupervised Learning Engine EM, via inside-outside re-estimation (Baker, 1979) (Manning and Sch¨ utze, 1999) N 1 α BLACK BOX N j β w 1 w p − 1 w p w q w q +1 w m · · · · · · · · · Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 9 / 30

  35. State-of-the-Art State-of-the-Art: The Standard Corpus Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30

  36. State-of-the-Art State-of-the-Art: The Standard Corpus Training: WSJ10 (Klein, 2005) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30

  37. State-of-the-Art State-of-the-Art: The Standard Corpus Training: WSJ10 (Klein, 2005) ◮ The Wall Street Journal section of the Penn Treebank Project (Marcus et al., 1993) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30

  38. State-of-the-Art State-of-the-Art: The Standard Corpus Training: WSJ10 (Klein, 2005) ◮ The Wall Street Journal section of the Penn Treebank Project (Marcus et al., 1993) ◮ ... stripped of punctuation, etc. Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30

  39. State-of-the-Art State-of-the-Art: The Standard Corpus Training: WSJ10 (Klein, 2005) ◮ The Wall Street Journal section of the Penn Treebank Project (Marcus et al., 1993) ◮ ... stripped of punctuation, etc. ◮ ... filtered down to sentences left with no more than 10 POS tags; Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30

  40. State-of-the-Art State-of-the-Art: The Standard Corpus Training: WSJ10 (Klein, 2005) ◮ The Wall Street Journal section of the Penn Treebank Project (Marcus et al., 1993) ◮ ... stripped of punctuation, etc. ◮ ... filtered down to sentences left with no more than 10 POS tags; ◮ ... and converted to reference dependencies using “head percolation rules” (Collins, 1999) . Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30

  41. State-of-the-Art State-of-the-Art: The Standard Corpus Training: WSJ10 (Klein, 2005) ◮ The Wall Street Journal section of the Penn Treebank Project (Marcus et al., 1993) ◮ ... stripped of punctuation, etc. ◮ ... filtered down to sentences left with no more than 10 POS tags; ◮ ... and converted to reference dependencies using “head percolation rules” (Collins, 1999) . Evaluation: Section 23 of WSJ ∞ (all sentences). Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 10 / 30

  42. State-of-the-Art State-of-the-Art: The Standard Corpus 45 900 Sentences (1,000s) 40 800 35 700 30 600 25 500 20 400 15 300 10 200 Tokens (1,000s) 5 100 5 10 15 20 25 30 35 40 45 WSJ k Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 11 / 30

  43. State-of-the-Art State-of-the-Art: The Standard Corpus 45 900 Sentences (1,000s) 40 800 35 700 30 600 25 500 20 400 15 300 10 200 Tokens (1,000s) 5 100 5 10 15 20 25 30 35 40 45 WSJ k Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 11 / 30

  44. (At Least) Two Issues Issue I: Why so little data? Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 12 / 30

  45. (At Least) Two Issues Issue I: Why so little data? extra unlabeled data helps semi-supervised parsing (Suzuki et al., 2009) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 12 / 30

  46. (At Least) Two Issues Issue I: Why so little data? extra unlabeled data helps semi-supervised parsing (Suzuki et al., 2009) yet state-of-the-art unsupervised methods use even less than what’s available for supervised training... Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 12 / 30

  47. (At Least) Two Issues Issue I: Why so little data? extra unlabeled data helps semi-supervised parsing (Suzuki et al., 2009) yet state-of-the-art unsupervised methods use even less than what’s available for supervised training... we will explore (three) judicious uses of data and simple, scalable machine learning techniques Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 12 / 30

  48. (At Least) Two Issues Issue II: Non-convex objective... Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 13 / 30

  49. (At Least) Two Issues Issue II: Non-convex objective... maximizing the probability of data (sentences): � � ˆ θ UNS = arg max log P θ ( t ) θ s t ∈ T ( s ) � �� � P θ ( s ) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 13 / 30

  50. (At Least) Two Issues Issue II: Non-convex objective... maximizing the probability of data (sentences): � � ˆ θ UNS = arg max log P θ ( t ) θ s t ∈ T ( s ) � �� � P θ ( s ) supervised objective would be convex (counting): � ˆ log P θ ( t ∗ ( s )) . θ SUP = arg max θ s Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 13 / 30

  51. (At Least) Two Issues Issue II: Non-convex objective... maximizing the probability of data (sentences): � � ˆ θ UNS = arg max log P θ ( t ) θ s t ∈ T ( s ) � �� � P θ ( s ) supervised objective would be convex (counting): � ˆ log P θ ( t ∗ ( s )) . θ SUP = arg max θ s in general, ˆ θ SUP � = ˆ θ UNS and ˆ θ UNS � = ˜ θ UNS ... (see CoNLL) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 13 / 30

  52. (At Least) Two Issues Issue II: Non-convex objective... maximizing the probability of data (sentences): � � ˆ θ UNS = arg max log P θ ( t ) θ s t ∈ T ( s ) � �� � P θ ( s ) supervised objective would be convex (counting): � ˆ log P θ ( t ∗ ( s )) . θ SUP = arg max θ s in general, ˆ θ SUP � = ˆ θ UNS and ˆ θ UNS � = ˜ θ UNS ... (see CoNLL) initialization matters! Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 13 / 30

  53. (At Least) Two Issues Issues: The Lay of the Land Directed Dependency Accuracy (%) 90 on WSJ k 80 70 60 50 40 30 20 5 10 15 20 25 30 35 40 WSJ k

  54. (At Least) Two Issues Issues: The Lay of the Land Directed Dependency Accuracy (%) 90 on WSJ k 80 70 60 50 40 30 20 5 10 15 20 25 30 35 40 WSJ k

  55. (At Least) Two Issues Issues: The Lay of the Land Directed Dependency Accuracy (%) 90 on WSJ k 80 70 60 50 40 30 Uninformed 20 5 10 15 20 25 30 35 40 WSJ k

  56. (At Least) Two Issues Issues: The Lay of the Land Directed Dependency Accuracy (%) 90 on WSJ k 80 70 60 50 40 30 Uninformed 20 5 10 15 20 25 30 35 40 WSJ k

  57. (At Least) Two Issues Issues: The Lay of the Land Directed Dependency Accuracy (%) 90 on WSJ k 80 70 60 50 40 30 Uninformed 20 5 10 15 20 25 30 35 40 WSJ k

  58. (At Least) Two Issues Issues: The Lay of the Land Directed Dependency Accuracy (%) 90 on WSJ k 80 70 60 50 40 30 Uninformed 20 5 10 15 20 25 30 35 40 WSJ k

  59. (At Least) Two Issues Issues: The Lay of the Land Directed Dependency Accuracy (%) 90 on WSJ k 80 70 60 Oracle 50 40 30 Uninformed 20 5 10 15 20 25 30 35 40 WSJ k

  60. (At Least) Two Issues Issues: The Lay of the Land Directed Dependency Accuracy (%) 90 on WSJ k 80 70 60 Oracle 50 40 30 Uninformed 20 5 10 15 20 25 30 35 40 WSJ k

  61. (At Least) Two Issues Issues: The Lay of the Land Directed Dependency Accuracy (%) 90 on WSJ k 80 70 60 Oracle 50 K&M (Ad-Hoc Harmonic Init) 40 30 Uninformed 20 5 10 15 20 25 30 35 40 WSJ k Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 14 / 30

  62. Baby Steps Idea I: Baby Steps ... as Non-convex Optimization Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

  63. Baby Steps Idea I: Baby Steps ... as Non-convex Optimization global non-convex optimization is hard ... Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

  64. Baby Steps Idea I: Baby Steps ... as Non-convex Optimization global non-convex optimization is hard ... meta-heuristic: take guesswork out of local search Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

  65. Baby Steps Idea I: Baby Steps ... as Non-convex Optimization global non-convex optimization is hard ... meta-heuristic: take guesswork out of local search start with an easy (convex) case Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

  66. Baby Steps Idea I: Baby Steps ... as Non-convex Optimization global non-convex optimization is hard ... meta-heuristic: take guesswork out of local search start with an easy (convex) case slowly extend it to the fully complex target task Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

  67. Baby Steps Idea I: Baby Steps ... as Non-convex Optimization global non-convex optimization is hard ... meta-heuristic: take guesswork out of local search start with an easy (convex) case slowly extend it to the fully complex target task take tiny (cautious) steps in the problem space Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

  68. Baby Steps Idea I: Baby Steps ... as Non-convex Optimization global non-convex optimization is hard ... meta-heuristic: take guesswork out of local search start with an easy (convex) case slowly extend it to the fully complex target task take tiny (cautious) steps in the problem space ... try not to stray far from relevant neighborhoods in the solution space Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

  69. Baby Steps Idea I: Baby Steps ... as Non-convex Optimization global non-convex optimization is hard ... meta-heuristic: take guesswork out of local search start with an easy (convex) case slowly extend it to the fully complex target task take tiny (cautious) steps in the problem space ... try not to stray far from relevant neighborhoods in the solution space base case: sentences of length one (trivial — no init) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

  70. Baby Steps Idea I: Baby Steps ... as Non-convex Optimization global non-convex optimization is hard ... meta-heuristic: take guesswork out of local search start with an easy (convex) case slowly extend it to the fully complex target task take tiny (cautious) steps in the problem space ... try not to stray far from relevant neighborhoods in the solution space base case: sentences of length one (trivial — no init) incremental step: smooth WSJ k ; re-init WSJ ( k + 1) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

  71. Baby Steps Idea I: Baby Steps ... as Non-convex Optimization global non-convex optimization is hard ... meta-heuristic: take guesswork out of local search start with an easy (convex) case slowly extend it to the fully complex target task take tiny (cautious) steps in the problem space ... try not to stray far from relevant neighborhoods in the solution space base case: sentences of length one (trivial — no init) incremental step: smooth WSJ k ; re-init WSJ ( k + 1) ... this really is grammar induction! Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 15 / 30

  72. Baby Steps Idea I: Baby Steps ... as Graduated Learning Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 16 / 30

  73. Baby Steps Idea I: Baby Steps ... as Graduated Learning WSJ1 — Atone (verbs!) Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 16 / 30

  74. Baby Steps Idea I: Baby Steps ... as Graduated Learning WSJ1 — Atone (verbs!) Darkness fell. (nouns!) WSJ2 — It is. Judge Not Spitkovsky et al. (Stanford & Google) From Baby Steps to Leapfrog NAACL HLT (2010-06-04) 16 / 30

Recommend


More recommend