breaking out of local optima with
play

Breaking Out of Local Optima with Count Transforms and Model - PowerPoint PPT Presentation

Breaking Out of Local Optima with Count Transforms and Model Recombination Valentin I. Spitkovsky with Hiyan Alshawi (Google Inc.) and Daniel Jurafsky (Stanford University) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima


  1. Goal Improving I Goal: How to not get stuck and make progress? Challenge: ◮ given a (locally optimal) solution, find a better solution ⋆ e.g., turn a set of parse trees into a better set Desiderata: ◮ want an informed, medium-size step in parameter space ◮ not too big (e.g., random restarts undo all previous work) ◮ not too small (i.e., not overly self-similar, as in MCMC) Algorithm Template: ◮ selectively forget (or filter) some aspect of a solution, ◮ re-optimize from this new starting point, ◮ and take the better of the two. Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 5 / 21

  2. Goal Improving I Goal: How to not get stuck and make progress? Challenge: ◮ given a (locally optimal) solution, find a better solution ⋆ e.g., turn a set of parse trees into a better set Desiderata: ◮ want an informed, medium-size step in parameter space ◮ not too big (e.g., random restarts undo all previous work) ◮ not too small (i.e., not overly self-similar, as in MCMC) Algorithm Template: Count Transforms ◮ selectively forget (or filter) some aspect of a solution, ◮ re-optimize from this new starting point, ◮ and take the better of the two. Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 5 / 21

  3. Transforms Symmetrizer Transforms: Symmetrizer (Forget Polarity) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 6 / 21

  4. Transforms Symmetrizer Transforms: Symmetrizer (Forget Polarity) learn from the undirected arcs of skeletal structures Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 6 / 21

  5. Transforms Symmetrizer Transforms: Symmetrizer (Forget Polarity) learn from the undirected arcs of skeletal structures ♦ N N V P N | | | | | | Factory payrolls fell in September . Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 6 / 21

  6. Transforms Symmetrizer Transforms: Symmetrizer (Forget Polarity) learn from the undirected arcs of skeletal structures ♦ N N V P N | | | | | | Factory payrolls fell in September . Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 6 / 21

  7. Transforms Symmetrizer Transforms: Symmetrizer (Forget Polarity) learn from the undirected arcs of skeletal structures ♦ N N V P N | | | | | | Factory payrolls fell in September . ◮ once we kind of understand which words go together, take another whack at making heads or tails of syntax! Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 6 / 21

  8. Transforms Filter Transforms: Filter (Forget Incomplete Fragments) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 7 / 21

  9. Transforms Filter Transforms: Filter (Forget Incomplete Fragments) start by splitting text on punctuation (Spitkovsky et al., 2012) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 7 / 21

  10. Transforms Filter Transforms: Filter (Forget Incomplete Fragments) start by splitting text on punctuation (Spitkovsky et al., 2012) Linguistics from (simple) Wikipedia Linguistics (sometimes called philology) is the science that studies language. Scientists who study language are called linguists. Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 7 / 21

  11. Transforms Filter Transforms: Filter (Forget Incomplete Fragments) start by splitting text on punctuation (Spitkovsky et al., 2012) Linguistics from (simple) Wikipedia Linguistics (sometimes called philology) is the science that studies language. Scientists who study language are called linguists. Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 7 / 21

  12. Transforms Filter Transforms: Filter (Forget Incomplete Fragments) start by splitting text on punctuation (Spitkovsky et al., 2012) Linguistics from (simple) Wikipedia Linguistics Stage I (sometimes called philology) is the science that studies language. Scientists who study language are called linguists. Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 7 / 21

  13. Transforms Filter Transforms: Filter (Forget Incomplete Fragments) start by splitting text on punctuation (Spitkovsky et al., 2012) Linguistics from (simple) Wikipedia Linguistics Stage I (sometimes called philology) is the science that studies language. Scientists who study language are called linguists. ◮ once we’ve bootstrapped a rudimentary grammar, retry from just the clean, simple complete sentences! Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 7 / 21

  14. Transforms Filter Transforms: Filter (Forget Incomplete Fragments) start by splitting text on punctuation (Spitkovsky et al., 2012) Stage II Scientists who study language are called linguists. ◮ once we’ve bootstrapped a rudimentary grammar, retry from just the clean, simple complete sentences! Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 7 / 21

  15. Transforms Decoder Transforms: Decoder (Forget Unlikely Parses) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 8 / 21

  16. Transforms Decoder Transforms: Decoder (Forget Unlikely Parses) discard most interpretations (a step of Viterbi training) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 8 / 21

  17. Transforms Decoder Transforms: Decoder (Forget Unlikely Parses) discard most interpretations (a step of Viterbi training) 0.4 ♦ N N V P N | | | | | | Factory payrolls fell in September . 0.3 ♦ N N V P N | | | | | | Factory payrolls fell in September . 0.3 N N V P N ♦ | | | | | | Factory payrolls fell in September . Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 8 / 21

  18. Transforms Decoder Transforms: Decoder (Forget Unlikely Parses) discard most interpretations (a step of Viterbi training) 1.0 ♦ N N V P N | | | | | | Factory payrolls fell in September . Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 8 / 21

  19. Transforms Decoder Transforms: Decoder (Forget Unlikely Parses) discard most interpretations (a step of Viterbi training) 1.0 ♦ N N V P N | | | | | | Factory payrolls fell in September . ◮ many reasons why Viterbi steps are a good idea: e.g., M-step initialization (Klein and Manning, 2004) (Cohen and Smith, 2010) (Spitkovsky et al., 2010) (Allahverdyan and Galstyan, 2011) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 8 / 21

  20. Pop-Up Generic Pop-up: This is not specific to grammar induction! Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 9 / 21

  21. Pop-Up Generic Pop-up: This is not specific to grammar induction! proposed primitive transform operators (unary): Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 9 / 21

  22. Pop-Up Generic Pop-up: This is not specific to grammar induction! proposed primitive transform operators (unary): ◮ model ablation (i.e., forget something you learned) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 9 / 21

  23. Pop-Up Generic Pop-up: This is not specific to grammar induction! proposed primitive transform operators (unary): ◮ model ablation (i.e., forget something you learned); ◮ data filtering (e.g., drop complex inputs); Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 9 / 21

  24. Pop-Up Generic Pop-up: This is not specific to grammar induction! proposed primitive transform operators (unary): ◮ model ablation (i.e., forget something you learned); ◮ data filtering (e.g., drop complex inputs); ◮ Viterbi stepping (i.e., decode your data). Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 9 / 21

  25. Pop-Up Generic Pop-up: This is not specific to grammar induction! proposed primitive transform operators (unary): ◮ model ablation (i.e., forget something you learned); ◮ data filtering (e.g., drop complex inputs); ◮ Viterbi stepping (i.e., decode your data). just need operators (binary or higher) to combine them: ◮ a robust way to merge alternatives of varying quality... Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 9 / 21

  26. Pop-Up Generic Pop-up: This is not specific to grammar induction! proposed primitive transform operators (unary): ◮ model ablation (i.e., forget something you learned); ◮ data filtering (e.g., drop complex inputs); ◮ Viterbi stepping (i.e., decode your data). just need operators (binary or higher) to combine them: ◮ a robust way to merge alternatives of varying quality... could construct complex networks that fork/join inputs: ◮ useful for many (non-convex) optimization problems! Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 9 / 21

  27. Goal Improving II Goal: How to not get stuck and make progress? Challenge # 2: Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

  28. Goal Improving II Goal: How to not get stuck and make progress? Challenge # 2: ◮ given multiple (local) solutions, find a better one Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

  29. Goal Improving II Goal: How to not get stuck and make progress? Challenge # 2: ◮ given multiple (local) solutions, find a better one Algorithm # 2: Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

  30. Goal Improving II Goal: How to not get stuck and make progress? Challenge # 2: ◮ given multiple (local) solutions, find a better one Algorithm # 2: ◮ compute a mixture model Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

  31. Goal Improving II Goal: How to not get stuck and make progress? Challenge # 2: ◮ given multiple (local) solutions, find a better one Algorithm # 2: ◮ compute a mixture model, ◮ re-optimize from this new starting point Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

  32. Goal Improving II Goal: How to not get stuck and make progress? Challenge # 2: ◮ given multiple (local) solutions, find a better one Algorithm # 2: ◮ compute a mixture model, ◮ re-optimize from this new starting point, ◮ and take the better of the three. Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

  33. Goal Improving II Goal: How to not get stuck and make progress? Challenge # 2: ◮ given multiple (local) solutions, find a better one Algorithm # 2: Model Combination ◮ compute a mixture model, ◮ re-optimize from this new starting point, ◮ and take the better of the three. Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

  34. Goal Improving II Goal: How to not get stuck and make progress? Challenge # 2: ◮ given multiple (local) solutions, find a better one Algorithm # 2: Model Combination ◮ compute a mixture model, ◮ re-optimize from this new starting point, ◮ and take the better of the three. Improved Algorithm # 2: Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

  35. Goal Improving II Goal: How to not get stuck and make progress? Challenge # 2: ◮ given multiple (local) solutions, find a better one Algorithm # 2: Model Combination ◮ compute a mixture model, ◮ re-optimize from this new starting point, ◮ and take the better of the three. Improved Algorithm # 2: ◮ don’t have to stop there... Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

  36. Goal Improving II Goal: How to not get stuck and make progress? Challenge # 2: ◮ given multiple (local) solutions, find a better one Algorithm # 2: Model Combination ◮ compute a mixture model, ◮ re-optimize from this new starting point, ◮ and take the better of the three. Improved Algorithm # 2: ◮ don’t have to stop there... ◮ if output is better than the worst input, replace and recurse! Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

  37. Goal Improving II Goal: How to not get stuck and make progress? Challenge # 2: ◮ given multiple (local) solutions, find a better one Algorithm # 2: Model Combination ◮ compute a mixture model, ◮ re-optimize from this new starting point, ◮ and take the better of the three. Improved Algorithm # 2: Model Recombination ◮ don’t have to stop there... ◮ if output is better than the worst input, replace and recurse! Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 10 / 21

  38. Theme Story Theme: Try, try again!! Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 11 / 21

  39. Theme Story Theme: Try, try again!! Story-telling time... Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 11 / 21

  40. Theme Story Theme: Try, try again!! Story-telling time... Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 11 / 21

  41. Theme Story Theme: Try, try again!! Story-telling time... Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 11 / 21

  42. Theme Story Theme: Try, try again!! Story-telling time... Dr. Wiesner Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 11 / 21

  43. Theme Story Theme: Try, try again!! Story-telling time... Dr. Wiesner, you said “Keep on moving; keep on moving!” http://web.mit.edu/newsoffice/1995/vest-weisner-0621.html Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 11 / 21

  44. Theme Lateen Theme: Many many ways to “keep on moving!” Challenge # 3: Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 12 / 21

  45. Theme Lateen Theme: Many many ways to “keep on moving!” Challenge # 3: ◮ everything else has failed, Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 12 / 21

  46. Theme Lateen Theme: Many many ways to “keep on moving!” Challenge # 3: ◮ everything else has failed, ◮ all transformers and combiners are stuck... Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 12 / 21

  47. Theme Lateen Theme: Many many ways to “keep on moving!” Challenge # 3: ◮ everything else has failed, ◮ all transformers and combiners are stuck... Algorithm # 3 : “lateen EM” (Spitkovsky et al., 2011) ◮ use multiple objectives (they are all wrong anyway) ◮ e.g., if soft EM is stuck, use hard EM to dig it out... Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 12 / 21

  48. Theme Lateen Theme: Many many ways to “keep on moving!” Challenge # 3: ◮ everything else has failed, ◮ all transformers and combiners are stuck... Algorithm # 3 : “lateen EM” (Spitkovsky et al., 2011) ◮ use multiple objectives (they are all wrong anyway) ◮ e.g., if soft EM is stuck, use hard EM to dig it out... many useful alternative ways to view data: ◮ sentence strings or parse trees (Spitkovsky et al., 2010; 2011) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 12 / 21

  49. Theme Lateen Theme: Many many ways to “keep on moving!” Challenge # 3: ◮ everything else has failed, ◮ all transformers and combiners are stuck... Algorithm # 3 : “lateen EM” (Spitkovsky et al., 2011) ◮ use multiple objectives (they are all wrong anyway) ◮ e.g., if soft EM is stuck, use hard EM to dig it out... many useful alternative ways to view data: ◮ sentence strings or parse trees (Spitkovsky et al., 2010; 2011) ◮ all data or just short sentences (Klein and Manning, 2004) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 12 / 21

  50. Theme Lateen Theme: Many many ways to “keep on moving!” Challenge # 3: ◮ everything else has failed, ◮ all transformers and combiners are stuck... Algorithm # 3 : “lateen EM” (Spitkovsky et al., 2011) ◮ use multiple objectives (they are all wrong anyway) ◮ e.g., if soft EM is stuck, use hard EM to dig it out... many useful alternative ways to view data: ◮ sentence strings or parse trees (Spitkovsky et al., 2010; 2011) ◮ all data or just short sentences (Klein and Manning, 2004) ◮ words or categories (Paskin, 2001; vs. Carroll and Charniak, 1992) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 12 / 21

  51. Theme Lateen Theme: Many many ways to “keep on moving!” Challenge # 3: ◮ everything else has failed, ◮ all transformers and combiners are stuck... Algorithm # 3 : “lateen EM” (Spitkovsky et al., 2011) ◮ use multiple objectives (they are all wrong anyway) ◮ e.g., if soft EM is stuck, use hard EM to dig it out... many useful alternative ways to view data: ◮ sentence strings or parse trees (Spitkovsky et al., 2010; 2011) ◮ all data or just short sentences (Klein and Manning, 2004) ◮ words or categories (Paskin, 2001; vs. Carroll and Charniak, 1992) ◮ feature-rich or bare-bones models (Cohen and Smith, 2009; vs. Spitkovsky et al., 2012) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 12 / 21

  52. Theme Lateen Theme: Many many ways to “keep on moving!” Challenge # 3: ◮ everything else has failed, ◮ all transformers and combiners are stuck... Algorithm # 3 : “lateen EM” (Spitkovsky et al., 2011) ◮ use multiple objectives (they are all wrong anyway) ◮ e.g., if soft EM is stuck, use hard EM to dig it out... many useful alternative ways to view data: ◮ sentence strings or parse trees (Spitkovsky et al., 2010; 2011) ◮ all data or just short sentences (Klein and Manning, 2004) ◮ words or categories (Paskin, 2001; vs. Carroll and Charniak, 1992) ◮ feature-rich or bare-bones models (Cohen and Smith, 2009; vs. Spitkovsky et al., 2012) never let convergence interfere with your (non-convex) optimization... Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 12 / 21

  53. Networks Fork/Join (FJ) Networks: Fork/Join (FJ) counts Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

  54. Networks Fork/Join (FJ) Networks: Fork/Join (FJ) Simple Filter counts Symmetrizer Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

  55. Networks Fork/Join (FJ) Networks: Fork/Join (FJ) Full Model Simple Filter Optimizer counts Sparse Model Symmetrizer Optimizer Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

  56. Networks Fork/Join (FJ) Networks: Fork/Join (FJ) Full Model Simple Filter Optimizer counts Sparse Model Symmetrizer Optimizer Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

  57. Networks Fork/Join (FJ) Networks: Fork/Join (FJ) Full Model Simple Filter Optimizer counts Sparse Model Symmetrizer Optimizer Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

  58. Networks Fork/Join (FJ) Networks: Fork/Join (FJ) Full Model Simple Filter Optimizer counts Combiner Sparse Model Symmetrizer Optimizer Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

  59. Networks Fork/Join (FJ) Networks: Fork/Join (FJ) full F counts full sparse S Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

  60. Networks Fork/Join (FJ) Networks: Fork/Join (FJ) full F counts Decoders full sparse S Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

  61. Networks Fork/Join (FJ) Networks: Fork/Join (FJ) soft EM full F lexicalized counts full hard EM sparse S soft EM Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

  62. Networks Fork/Join (FJ) Networks: Fork/Join (FJ) full F counts full sparse S Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

  63. Networks Fork/Join (FJ) Networks: Fork/Join (FJ) full F counts full sparse S a “grammar inductor” will represent FJ subnetworks: Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 13 / 21

  64. Networks Iterated Fork/Join (IFJ) Networks: Iterated Fork/Join (IFJ) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 14 / 21

  65. Networks Iterated Fork/Join (IFJ) Networks: Iterated Fork/Join (IFJ) daisy-chain inductors, as in “baby steps” (Spitkovsky et al., 2009) inputs up to length one up to up to length two length l Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 14 / 21

  66. Networks Iterated Fork/Join (IFJ) Networks: Iterated Fork/Join (IFJ) daisy-chain inductors, as in “baby steps” (Spitkovsky et al., 2009) inputs up to length one up to up to length two length l ◮ start with inputs up to length one ⋆ they have unique parses — an easy (convex) case Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 14 / 21

  67. Networks Iterated Fork/Join (IFJ) Networks: Iterated Fork/Join (IFJ) daisy-chain inductors, as in “baby steps” (Spitkovsky et al., 2009) inputs up to length one up to up to length two length l ◮ start with inputs up to length one ⋆ they have unique parses — an easy (convex) case ◮ output initializes training with slightly longer inputs ⋆ gradually extend solutions to the fully complex target task Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 14 / 21

  68. Networks Iterated Fork/Join (IFJ) Networks: Iterated Fork/Join (IFJ) daisy-chain inductors, as in “baby steps” (Spitkovsky et al., 2009) inputs up to length one up to up to length two length l ◮ start with inputs up to length one ⋆ they have unique parses — an easy (convex) case ◮ output initializes training with slightly longer inputs ⋆ gradually extend solutions to the fully complex target task Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 14 / 21

  69. Networks Iterated Fork/Join (IFJ) Networks: Iterated Fork/Join (IFJ) daisy-chain inductors, as in “baby steps” (Spitkovsky et al., 2009) inputs up to length one up to up to length two length l ◮ start with inputs up to length one ⋆ they have unique parses — an easy (convex) case ◮ output initializes training with slightly longer inputs ⋆ gradually extend solutions to the fully complex target task — an instance of deterministic annealing (Allgower and Georg, 1990; Rose, 1998) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 14 / 21

  70. Networks Grounded Iterated Fork/Join (GIFJ) Networks: Grounded Iterated Fork/Join (GIFJ) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 15 / 21

  71. Networks Grounded Iterated Fork/Join (GIFJ) Networks: Grounded Iterated Fork/Join (GIFJ) combine purely iterative (IFJ) and static (FJ) networks: counts-up-to- ( l − 1) empty-set-of-counts up to length l Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 15 / 21

  72. Networks Grounded Iterated Fork/Join (GIFJ) Networks: Grounded Iterated Fork/Join (GIFJ) combine purely iterative (IFJ) and static (FJ) networks: counts-up-to- ( l − 1) empty-set-of-counts up to length l Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 15 / 21

  73. Networks Grounded Iterated Fork/Join (GIFJ) Networks: Grounded Iterated Fork/Join (GIFJ) combine purely iterative (IFJ) and static (FJ) networks: counts-up-to- ( l − 1) empty-set-of-counts up to length l Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 15 / 21

  74. Networks Grounded Iterated Fork/Join (GIFJ) Networks: Grounded Iterated Fork/Join (GIFJ) combine purely iterative (IFJ) and static (FJ) networks: counts-up-to- ( l − 1) counts-up-to- l full empty-set-of-counts up to length l Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 15 / 21

  75. Networks Grounded Iterated Fork/Join (GIFJ) Networks: Grounded Iterated Fork/Join (GIFJ) combine purely iterative (IFJ) and static (FJ) networks: counts-up-to- ( l − 1) counts-up-to- l full empty-set-of-counts up to length l ◮ full network obtained by unrolling the template (as a DBN) Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 15 / 21

  76. Networks Grounded Iterated Fork/Join (GIFJ) Networks: Grounded Iterated Fork/Join (GIFJ) combine purely iterative (IFJ) and static (FJ) networks: counts-up-to- ( l − 1) counts-up-to- l full empty-set-of-counts up to length l ◮ full network obtained by unrolling the template (as a DBN) ⋆ can specify relatively “deep” learning architectures ⋆ without sacrificing (too much) clarity or simplicity Spitkovsky et al. (Stanford & Google) Breaking out of Local Optima EMNLP (2013-10-21) 15 / 21

Recommend


More recommend