advanced search algorithms
play

Advanced Search Algorithms Graham Neubig - PowerPoint PPT Presentation

CS11-747 Neural Networks for NLP Advanced Search Algorithms Graham Neubig https://phontron.com/class/nn4nlp2020/ (Some Slides by Daniel Clothiaux) The Generation Problem We have a model of P(Y|X), how do we use it to generate a sentence?


  1. CS11-747 Neural Networks for NLP Advanced Search Algorithms Graham Neubig https://phontron.com/class/nn4nlp2020/ (Some Slides by Daniel Clothiaux)

  2. The Generation Problem • We have a model of P(Y|X), how do we use it to generate a sentence? • Two methods: • Sampling: Try to generate a random sentence according to the probability distribution. • Argmax: Try to generate the sentence with the highest probability.

  3. Which to Use? • We want the best possible single output 
 → Search • We want to observe multiple outputs according to the probability distribution 
 → Sampling • We want to generate diverse outputs so that we are not boring 
 → Sampling? Search?

  4. Sampling

  5. 
 
 
 Ancestral Sampling • Randomly generate words one-by-one. 
 while y j-1 != “</s>”: y j ~ P(y j | X, y 1 , …, y j-1 ) • An exact method for sampling from P(X), no further work needed. • Any other sampling method is not an appropriate way of visualizing/understanding the underlying distribution.

  6. Search Basics

  7. <latexit sha1_base64="KTzU+V/eNKvP2hqb752FtwYbJj8=">ACQ3icbVBNaxsxFNQ6bZK6beKkx15ETbENxeyGQpNDwLSXHh2oExuvMVrtsy0saRfpbYjZbP5bLv0BvfUX9JDWnItRP6gtHYHBMPMPN7TRKkUFn3/u1faevJ0e2f3Wfn5i5d7+5WDw3ObZIZDhycyMd2IWZBCQwcFSuimBpiKJFxE09z/+ISjBWJ/oKzFAaKjbUYCc7QScNKP5wzHsFPaVhpmOXBMxDFDIGpxaOwhXmzIyV0EVxQ0ObqWHeqxW0Xe/VrsNukyAMYkpnPTuz3BjWKn6TX8BukmCFamSFdrDyrcwTnimQCOXzNp+4Kc4cNtRcAlFOcwspIxP2Rj6jmqmwA7yRQkFfeuUmI4S45GulD/nsiZsnamIpdUDCd23ZuL/P6GY6OB7nQaYag+XLRKJMUEzpvlMbCAEc5c4RxI9ytlE+YRxdm2VXQrD+5U3SOWqeNIOz9XWx1Ubu+Q1eUPqJCAfSIt8Jm3SIZzckh/knvz0vnp3i/vYRkteauZV+QfeL8fARbSsto=</latexit> <latexit sha1_base64="KTzU+V/eNKvP2hqb752FtwYbJj8=">ACQ3icbVBNaxsxFNQ6bZK6beKkx15ETbENxeyGQpNDwLSXHh2oExuvMVrtsy0saRfpbYjZbP5bLv0BvfUX9JDWnItRP6gtHYHBMPMPN7TRKkUFn3/u1faevJ0e2f3Wfn5i5d7+5WDw3ObZIZDhycyMd2IWZBCQwcFSuimBpiKJFxE09z/+ISjBWJ/oKzFAaKjbUYCc7QScNKP5wzHsFPaVhpmOXBMxDFDIGpxaOwhXmzIyV0EVxQ0ObqWHeqxW0Xe/VrsNukyAMYkpnPTuz3BjWKn6TX8BukmCFamSFdrDyrcwTnimQCOXzNp+4Kc4cNtRcAlFOcwspIxP2Rj6jmqmwA7yRQkFfeuUmI4S45GulD/nsiZsnamIpdUDCd23ZuL/P6GY6OB7nQaYag+XLRKJMUEzpvlMbCAEc5c4RxI9ytlE+YRxdm2VXQrD+5U3SOWqeNIOz9XWx1Ubu+Q1eUPqJCAfSIt8Jm3SIZzckh/knvz0vnp3i/vYRkteauZV+QfeL8fARbSsto=</latexit> <latexit sha1_base64="KTzU+V/eNKvP2hqb752FtwYbJj8=">ACQ3icbVBNaxsxFNQ6bZK6beKkx15ETbENxeyGQpNDwLSXHh2oExuvMVrtsy0saRfpbYjZbP5bLv0BvfUX9JDWnItRP6gtHYHBMPMPN7TRKkUFn3/u1faevJ0e2f3Wfn5i5d7+5WDw3ObZIZDhycyMd2IWZBCQwcFSuimBpiKJFxE09z/+ISjBWJ/oKzFAaKjbUYCc7QScNKP5wzHsFPaVhpmOXBMxDFDIGpxaOwhXmzIyV0EVxQ0ObqWHeqxW0Xe/VrsNukyAMYkpnPTuz3BjWKn6TX8BukmCFamSFdrDyrcwTnimQCOXzNp+4Kc4cNtRcAlFOcwspIxP2Rj6jmqmwA7yRQkFfeuUmI4S45GulD/nsiZsnamIpdUDCd23ZuL/P6GY6OB7nQaYag+XLRKJMUEzpvlMbCAEc5c4RxI9ytlE+YRxdm2VXQrD+5U3SOWqeNIOz9XWx1Ubu+Q1eUPqJCAfSIt8Jm3SIZzckh/knvz0vnp3i/vYRkteauZV+QfeL8fARbSsto=</latexit> 
 <latexit sha1_base64="daJTM+c0MGzbLqAwsNEmTltAuIU=">ACJnicbVDLSgMxFM34tr6qLt0Ei6CbMiOCulBENy4rWK10SslkbtgJjMkd6RlHL/Gjb/iRvCBuPNTGsRXwcCh3PO5eaeIJHCoOu+OSOjY+MTk1PThZnZufmF4uLSmYlTzaHKYxnrWsAMSKGgigIl1BINLAoknAeXR3/Aq0EbE6xV4CjYi1lWgJztBKzeK+32GYXeR0j/qpCm0SMPNRyBCsmlsKXcyYbkesm+c3lfUvj17T2kazWHL7gD0L/GpESGqDSLj34Y8zQChVwyY+qem2DLkDBJeQFPzWQMH7J2lC3VLEITCMb3JnTNauEtBVr+xTSgfp9ImORMb0osMmIYcf89vrif149xdZOIxMqSREU/1zUSiXFmPZLo6HQwFH2LGFcC/tXyjtM462sItwft98l9S3Szvlr2TrdLB4bCNKbJCVsk68cg2OSDHpEKqhJNbck+eyLNz5zw4L87rZ3TEGc4skx9w3j8AzM6nUg=</latexit> <latexit sha1_base64="daJTM+c0MGzbLqAwsNEmTltAuIU=">ACJnicbVDLSgMxFM34tr6qLt0Ei6CbMiOCulBENy4rWK10SslkbtgJjMkd6RlHL/Gjb/iRvCBuPNTGsRXwcCh3PO5eaeIJHCoOu+OSOjY+MTk1PThZnZufmF4uLSmYlTzaHKYxnrWsAMSKGgigIl1BINLAoknAeXR3/Aq0EbE6xV4CjYi1lWgJztBKzeK+32GYXeR0j/qpCm0SMPNRyBCsmlsKXcyYbkesm+c3lfUvj17T2kazWHL7gD0L/GpESGqDSLj34Y8zQChVwyY+qem2DLkDBJeQFPzWQMH7J2lC3VLEITCMb3JnTNauEtBVr+xTSgfp9ImORMb0osMmIYcf89vrif149xdZOIxMqSREU/1zUSiXFmPZLo6HQwFH2LGFcC/tXyjtM462sItwft98l9S3Szvlr2TrdLB4bCNKbJCVsk68cg2OSDHpEKqhJNbck+eyLNz5zw4L87rZ3TEGc4skx9w3j8AzM6nUg=</latexit> <latexit sha1_base64="daJTM+c0MGzbLqAwsNEmTltAuIU=">ACJnicbVDLSgMxFM34tr6qLt0Ei6CbMiOCulBENy4rWK10SslkbtgJjMkd6RlHL/Gjb/iRvCBuPNTGsRXwcCh3PO5eaeIJHCoOu+OSOjY+MTk1PThZnZufmF4uLSmYlTzaHKYxnrWsAMSKGgigIl1BINLAoknAeXR3/Aq0EbE6xV4CjYi1lWgJztBKzeK+32GYXeR0j/qpCm0SMPNRyBCsmlsKXcyYbkesm+c3lfUvj17T2kazWHL7gD0L/GpESGqDSLj34Y8zQChVwyY+qem2DLkDBJeQFPzWQMH7J2lC3VLEITCMb3JnTNauEtBVr+xTSgfp9ImORMb0osMmIYcf89vrif149xdZOIxMqSREU/1zUSiXFmPZLo6HQwFH2LGFcC/tXyjtM462sItwft98l9S3Szvlr2TrdLB4bCNKbJCVsk68cg2OSDHpEKqhJNbck+eyLNz5zw4L87rZ3TEGc4skx9w3j8AzM6nUg=</latexit> <latexit sha1_base64="pazaO1OUOgQ/R/MsnOhbEaj7I3Q=">ACMHicbVDLSgNBEJz1GeMr6tHLYBAiSNgVQT0IQS8eFYwPsiHMznaSwdnZaZXDMv6SV78E/HiQcWrX+HkgWi0YKC6upqeriCRwqDrvjgTk1PTM7OFueL8wuLScml9cLEqeZQ57GM9VXADEihoI4CJVwlGlgUSLgMbo7/ctb0EbE6hx7CTQj1lGiLThDK7VKJ36XYXad0Pqpyq0TsDMRyFDsGpuKdxhxnQnEirP74claB3rvHK9Tb+dW61S2a26A9C/xBuRMhnhtFV68sOYpxEo5JIZ0/DcBJt2FQouIS/6qYGE8RvWgYalikVgmtng4pxuWiWk7Vjbp5AO1J8TGYuM6UWBdUYMu2a81xf/6zVSbO83M6GSFEHx4aJ2KinGtB8fDYUGjrJnCeNa2L9S3mWacbTRFW0I3vjJf0l9p3pQ9c52y7WjURoFsk42SIV4ZI/UyAk5JXCyQN5Jq/kzXl0Xpx352NonXBGM2vkF5zPLylArDg=</latexit> <latexit sha1_base64="pazaO1OUOgQ/R/MsnOhbEaj7I3Q=">ACMHicbVDLSgNBEJz1GeMr6tHLYBAiSNgVQT0IQS8eFYwPsiHMznaSwdnZaZXDMv6SV78E/HiQcWrX+HkgWi0YKC6upqeriCRwqDrvjgTk1PTM7OFueL8wuLScml9cLEqeZQ57GM9VXADEihoI4CJVwlGlgUSLgMbo7/ctb0EbE6hx7CTQj1lGiLThDK7VKJ36XYXad0Pqpyq0TsDMRyFDsGpuKdxhxnQnEirP74claB3rvHK9Tb+dW61S2a26A9C/xBuRMhnhtFV68sOYpxEo5JIZ0/DcBJt2FQouIS/6qYGE8RvWgYalikVgmtng4pxuWiWk7Vjbp5AO1J8TGYuM6UWBdUYMu2a81xf/6zVSbO83M6GSFEHx4aJ2KinGtB8fDYUGjrJnCeNa2L9S3mWacbTRFW0I3vjJf0l9p3pQ9c52y7WjURoFsk42SIV4ZI/UyAk5JXCyQN5Jq/kzXl0Xpx352NonXBGM2vkF5zPLylArDg=</latexit> <latexit sha1_base64="pazaO1OUOgQ/R/MsnOhbEaj7I3Q=">ACMHicbVDLSgNBEJz1GeMr6tHLYBAiSNgVQT0IQS8eFYwPsiHMznaSwdnZaZXDMv6SV78E/HiQcWrX+HkgWi0YKC6upqeriCRwqDrvjgTk1PTM7OFueL8wuLScml9cLEqeZQ57GM9VXADEihoI4CJVwlGlgUSLgMbo7/ctb0EbE6hx7CTQj1lGiLThDK7VKJ36XYXad0Pqpyq0TsDMRyFDsGpuKdxhxnQnEirP74claB3rvHK9Tb+dW61S2a26A9C/xBuRMhnhtFV68sOYpxEo5JIZ0/DcBJt2FQouIS/6qYGE8RvWgYalikVgmtng4pxuWiWk7Vjbp5AO1J8TGYuM6UWBdUYMu2a81xf/6zVSbO83M6GSFEHx4aJ2KinGtB8fDYUGjrJnCeNa2L9S3mWacbTRFW0I3vjJf0l9p3pQ9c52y7WjURoFsk42SIV4ZI/UyAk5JXCyQN5Jq/kzXl0Xpx352NonXBGM2vkF5zPLylArDg=</latexit> 
 
 
 
 
 Why do we Search? • We want to find the best output • What is "best"? • The most accurate output 
 ˆ error( Y, ˜ Y = argmin Y ) ˜ Y → impossible! we don't know the reference • The most probable output according to the model 
 ˆ P ( ˜ Y = argmax Y | X ) ˜ Y → simple, but not necessarily tied to accuracy • The output with the lowest Bayes risk 
 ˆ P ( Y 0 | X )error( Y 0 , ˜ X Y = argmin Y ) ˜ Y Y 0 → which output looks like it has the lowest error?

  8. Search Errors, Model Errors (example from Neubig (2015)) • Search error: the search algorithm fails to find an output that optimizes its search criterion • Model error: the output that optimizes the search criterion does not optimize accuracy

  9. Searching Probable Outputs

  10. Greedy Search • One by one, pick the single highest-probability word while y j-1 != “</s>”: y j = argmax P(y j | X, y 1 , …, y j-1 ) • Not exact, real problems: • Will often generate the “easy” words first • Will prefer multiple common words to one rare word

  11. Why will this Help Next word P(next word) Pittsburgh 0.4 New York 0.3 New Jersey 0.25 Other 0.05

  12. Beam Search • Instead of picking the highest probability/score, maintain multiple paths • At each time step • Expand each path • Choose a subset paths from the expanded set

  13. Basic Pruning Methods (Steinbiss et al. 1994) • How to select which paths to keep expanding? • Histogram Pruning: Keep exactly k hypotheses at every time step • Score Threshold Pruning: Keep all hypotheses where score is within a threshold α of best score s 1 
 s n + α > s 1 • Probability Mass Pruning: Keep all hypotheses up until probability mass α

  14. What beam size should I use? • Larger beam sizes will be slower • May not give better results due to model errors • Sometimes result in shorter sequences • May favor high-frequency words • Mostly done empirically -> experiment (range of 5-100 for histogram?)

  15. 
 
 Problems w/ Disparate Search Difficulty • Sometimes need to cover specific content, some easy some hard 
 I saw the escarpment watashi mita dangai? zeppeki? kyushamen? iwa? • Can cause the search algorithm to select the easy thing first, then hard thing later watashi wa dangai wo mita watashi ga mita dangai (I saw the escarpment) (the escarpment I saw)

  16. Future Cost • also predict how hard it will be to process as-of-yet- unprocessed words, and search for maximum of sum f(n) = g(n) + h(n) • g(n): cost to current point • h(n): estimated cost to goal • See Koehn (2010 Chapter 6), or Li et al. (2017) for a neural approximation

  17. Search and Problems with Modeling

  18. Better Search can Hurt Results! (Koehn and Knowles 2017) • Better search (=better model score) can result in worse BLEU score! • Why? Model errors!

  19. How to Fix Model Errors? • Train the model to maximize accuracy/minimize risk (best!, covered previously) • Change the decision rule to minimize risk (best!) • Heuristically modify the model score post-hoc (OK) • Hobble the search algorithm so it makes more search errors, but the kind of errors you want (meh)

  20. Minimum Bayes Risk Decoding

Recommend


More recommend