Advanced Search Algorithms Graham Neubig - PowerPoint PPT Presentation

CS11-747 Neural Networks for NLP Advanced Search Algorithms Graham Neubig https://phontron.com/class/nn4nlp2020/ (Some Slides by Daniel Clothiaux)

The Generation Problem • We have a model of P(Y|X), how do we use it to generate a sentence? • Two methods: • Sampling: Try to generate a random sentence according to the probability distribution. • Argmax: Try to generate the sentence with the highest probability.

Which to Use? • We want the best possible single output   → Search • We want to observe multiple outputs according to the probability distribution   → Sampling • We want to generate diverse outputs so that we are not boring   → Sampling? Search?

Sampling

      Ancestral Sampling • Randomly generate words one-by-one.   while y j-1 != “</s>”: y j ~ P(y j | X, y 1 , …, y j-1 ) • An exact method for sampling from P(X), no further work needed. • Any other sampling method is not an appropriate way of visualizing/understanding the underlying distribution.

Search Basics

<latexit sha1_base64="KTzU+V/eNKvP2hqb752FtwYbJj8=">ACQ3icbVBNaxsxFNQ6bZK6beKkx15ETbENxeyGQpNDwLSXHh2oExuvMVrtsy0saRfpbYjZbP5bLv0BvfUX9JDWnItRP6gtHYHBMPMPN7TRKkUFn3/u1faevJ0e2f3Wfn5i5d7+5WDw3ObZIZDhycyMd2IWZBCQwcFSuimBpiKJFxE09z/+ISjBWJ/oKzFAaKjbUYCc7QScNKP5wzHsFPaVhpmOXBMxDFDIGpxaOwhXmzIyV0EVxQ0ObqWHeqxW0Xe/VrsNukyAMYkpnPTuz3BjWKn6TX8BukmCFamSFdrDyrcwTnimQCOXzNp+4Kc4cNtRcAlFOcwspIxP2Rj6jmqmwA7yRQkFfeuUmI4S45GulD/nsiZsnamIpdUDCd23ZuL/P6GY6OB7nQaYag+XLRKJMUEzpvlMbCAEc5c4RxI9ytlE+YRxdm2VXQrD+5U3SOWqeNIOz9XWx1Ubu+Q1eUPqJCAfSIt8Jm3SIZzckh/knvz0vnp3i/vYRkteauZV+QfeL8fARbSsto=</latexit> <latexit sha1_base64="KTzU+V/eNKvP2hqb752FtwYbJj8=">ACQ3icbVBNaxsxFNQ6bZK6beKkx15ETbENxeyGQpNDwLSXHh2oExuvMVrtsy0saRfpbYjZbP5bLv0BvfUX9JDWnItRP6gtHYHBMPMPN7TRKkUFn3/u1faevJ0e2f3Wfn5i5d7+5WDw3ObZIZDhycyMd2IWZBCQwcFSuimBpiKJFxE09z/+ISjBWJ/oKzFAaKjbUYCc7QScNKP5wzHsFPaVhpmOXBMxDFDIGpxaOwhXmzIyV0EVxQ0ObqWHeqxW0Xe/VrsNukyAMYkpnPTuz3BjWKn6TX8BukmCFamSFdrDyrcwTnimQCOXzNp+4Kc4cNtRcAlFOcwspIxP2Rj6jmqmwA7yRQkFfeuUmI4S45GulD/nsiZsnamIpdUDCd23ZuL/P6GY6OB7nQaYag+XLRKJMUEzpvlMbCAEc5c4RxI9ytlE+YRxdm2VXQrD+5U3SOWqeNIOz9XWx1Ubu+Q1eUPqJCAfSIt8Jm3SIZzckh/knvz0vnp3i/vYRkteauZV+QfeL8fARbSsto=</latexit> <latexit sha1_base64="KTzU+V/eNKvP2hqb752FtwYbJj8=">ACQ3icbVBNaxsxFNQ6bZK6beKkx15ETbENxeyGQpNDwLSXHh2oExuvMVrtsy0saRfpbYjZbP5bLv0BvfUX9JDWnItRP6gtHYHBMPMPN7TRKkUFn3/u1faevJ0e2f3Wfn5i5d7+5WDw3ObZIZDhycyMd2IWZBCQwcFSuimBpiKJFxE09z/+ISjBWJ/oKzFAaKjbUYCc7QScNKP5wzHsFPaVhpmOXBMxDFDIGpxaOwhXmzIyV0EVxQ0ObqWHeqxW0Xe/VrsNukyAMYkpnPTuz3BjWKn6TX8BukmCFamSFdrDyrcwTnimQCOXzNp+4Kc4cNtRcAlFOcwspIxP2Rj6jmqmwA7yRQkFfeuUmI4S45GulD/nsiZsnamIpdUDCd23ZuL/P6GY6OB7nQaYag+XLRKJMUEzpvlMbCAEc5c4RxI9ytlE+YRxdm2VXQrD+5U3SOWqeNIOz9XWx1Ubu+Q1eUPqJCAfSIt8Jm3SIZzckh/knvz0vnp3i/vYRkteauZV+QfeL8fARbSsto=</latexit>   <latexit sha1_base64="daJTM+c0MGzbLqAwsNEmTltAuIU=">ACJnicbVDLSgMxFM34tr6qLt0Ei6CbMiOCulBENy4rWK10SslkbtgJjMkd6RlHL/Gjb/iRvCBuPNTGsRXwcCh3PO5eaeIJHCoOu+OSOjY+MTk1PThZnZufmF4uLSmYlTzaHKYxnrWsAMSKGgigIl1BINLAoknAeXR3/Aq0EbE6xV4CjYi1lWgJztBKzeK+32GYXeR0j/qpCm0SMPNRyBCsmlsKXcyYbkesm+c3lfUvj17T2kazWHL7gD0L/GpESGqDSLj34Y8zQChVwyY+qem2DLkDBJeQFPzWQMH7J2lC3VLEITCMb3JnTNauEtBVr+xTSgfp9ImORMb0osMmIYcf89vrif149xdZOIxMqSREU/1zUSiXFmPZLo6HQwFH2LGFcC/tXyjtM462sItwft98l9S3Szvlr2TrdLB4bCNKbJCVsk68cg2OSDHpEKqhJNbck+eyLNz5zw4L87rZ3TEGc4skx9w3j8AzM6nUg=</latexit> <latexit sha1_base64="daJTM+c0MGzbLqAwsNEmTltAuIU=">ACJnicbVDLSgMxFM34tr6qLt0Ei6CbMiOCulBENy4rWK10SslkbtgJjMkd6RlHL/Gjb/iRvCBuPNTGsRXwcCh3PO5eaeIJHCoOu+OSOjY+MTk1PThZnZufmF4uLSmYlTzaHKYxnrWsAMSKGgigIl1BINLAoknAeXR3/Aq0EbE6xV4CjYi1lWgJztBKzeK+32GYXeR0j/qpCm0SMPNRyBCsmlsKXcyYbkesm+c3lfUvj17T2kazWHL7gD0L/GpESGqDSLj34Y8zQChVwyY+qem2DLkDBJeQFPzWQMH7J2lC3VLEITCMb3JnTNauEtBVr+xTSgfp9ImORMb0osMmIYcf89vrif149xdZOIxMqSREU/1zUSiXFmPZLo6HQwFH2LGFcC/tXyjtM462sItwft98l9S3Szvlr2TrdLB4bCNKbJCVsk68cg2OSDHpEKqhJNbck+eyLNz5zw4L87rZ3TEGc4skx9w3j8AzM6nUg=</latexit> <latexit sha1_base64="daJTM+c0MGzbLqAwsNEmTltAuIU=">ACJnicbVDLSgMxFM34tr6qLt0Ei6CbMiOCulBENy4rWK10SslkbtgJjMkd6RlHL/Gjb/iRvCBuPNTGsRXwcCh3PO5eaeIJHCoOu+OSOjY+MTk1PThZnZufmF4uLSmYlTzaHKYxnrWsAMSKGgigIl1BINLAoknAeXR3/Aq0EbE6xV4CjYi1lWgJztBKzeK+32GYXeR0j/qpCm0SMPNRyBCsmlsKXcyYbkesm+c3lfUvj17T2kazWHL7gD0L/GpESGqDSLj34Y8zQChVwyY+qem2DLkDBJeQFPzWQMH7J2lC3VLEITCMb3JnTNauEtBVr+xTSgfp9ImORMb0osMmIYcf89vrif149xdZOIxMqSREU/1zUSiXFmPZLo6HQwFH2LGFcC/tXyjtM462sItwft98l9S3Szvlr2TrdLB4bCNKbJCVsk68cg2OSDHpEKqhJNbck+eyLNz5zw4L87rZ3TEGc4skx9w3j8AzM6nUg=</latexit> <latexit sha1_base64="pazaO1OUOgQ/R/MsnOhbEaj7I3Q=">ACMHicbVDLSgNBEJz1GeMr6tHLYBAiSNgVQT0IQS8eFYwPsiHMznaSwdnZaZXDMv6SV78E/HiQcWrX+HkgWi0YKC6upqeriCRwqDrvjgTk1PTM7OFueL8wuLScml9cLEqeZQ57GM9VXADEihoI4CJVwlGlgUSLgMbo7/ctb0EbE6hx7CTQj1lGiLThDK7VKJ36XYXad0Pqpyq0TsDMRyFDsGpuKdxhxnQnEirP74claB3rvHK9Tb+dW61S2a26A9C/xBuRMhnhtFV68sOYpxEo5JIZ0/DcBJt2FQouIS/6qYGE8RvWgYalikVgmtng4pxuWiWk7Vjbp5AO1J8TGYuM6UWBdUYMu2a81xf/6zVSbO83M6GSFEHx4aJ2KinGtB8fDYUGjrJnCeNa2L9S3mWacbTRFW0I3vjJf0l9p3pQ9c52y7WjURoFsk42SIV4ZI/UyAk5JXCyQN5Jq/kzXl0Xpx352NonXBGM2vkF5zPLylArDg=</latexit> <latexit sha1_base64="pazaO1OUOgQ/R/MsnOhbEaj7I3Q=">ACMHicbVDLSgNBEJz1GeMr6tHLYBAiSNgVQT0IQS8eFYwPsiHMznaSwdnZaZXDMv6SV78E/HiQcWrX+HkgWi0YKC6upqeriCRwqDrvjgTk1PTM7OFueL8wuLScml9cLEqeZQ57GM9VXADEihoI4CJVwlGlgUSLgMbo7/ctb0EbE6hx7CTQj1lGiLThDK7VKJ36XYXad0Pqpyq0TsDMRyFDsGpuKdxhxnQnEirP74claB3rvHK9Tb+dW61S2a26A9C/xBuRMhnhtFV68sOYpxEo5JIZ0/DcBJt2FQouIS/6qYGE8RvWgYalikVgmtng4pxuWiWk7Vjbp5AO1J8TGYuM6UWBdUYMu2a81xf/6zVSbO83M6GSFEHx4aJ2KinGtB8fDYUGjrJnCeNa2L9S3mWacbTRFW0I3vjJf0l9p3pQ9c52y7WjURoFsk42SIV4ZI/UyAk5JXCyQN5Jq/kzXl0Xpx352NonXBGM2vkF5zPLylArDg=</latexit> <latexit sha1_base64="pazaO1OUOgQ/R/MsnOhbEaj7I3Q=">ACMHicbVDLSgNBEJz1GeMr6tHLYBAiSNgVQT0IQS8eFYwPsiHMznaSwdnZaZXDMv6SV78E/HiQcWrX+HkgWi0YKC6upqeriCRwqDrvjgTk1PTM7OFueL8wuLScml9cLEqeZQ57GM9VXADEihoI4CJVwlGlgUSLgMbo7/ctb0EbE6hx7CTQj1lGiLThDK7VKJ36XYXad0Pqpyq0TsDMRyFDsGpuKdxhxnQnEirP74claB3rvHK9Tb+dW61S2a26A9C/xBuRMhnhtFV68sOYpxEo5JIZ0/DcBJt2FQouIS/6qYGE8RvWgYalikVgmtng4pxuWiWk7Vjbp5AO1J8TGYuM6UWBdUYMu2a81xf/6zVSbO83M6GSFEHx4aJ2KinGtB8fDYUGjrJnCeNa2L9S3mWacbTRFW0I3vjJf0l9p3pQ9c52y7WjURoFsk42SIV4ZI/UyAk5JXCyQN5Jq/kzXl0Xpx352NonXBGM2vkF5zPLylArDg=</latexit>           Why do we Search? • We want to find the best output • What is "best"? • The most accurate output   ˆ error( Y, ˜ Y = argmin Y ) ˜ Y → impossible! we don't know the reference • The most probable output according to the model   ˆ P ( ˜ Y = argmax Y | X ) ˜ Y → simple, but not necessarily tied to accuracy • The output with the lowest Bayes risk   ˆ P ( Y 0 | X )error( Y 0 , ˜ X Y = argmin Y ) ˜ Y Y 0 → which output looks like it has the lowest error?

Search Errors, Model Errors (example from Neubig (2015)) • Search error: the search algorithm fails to find an output that optimizes its search criterion • Model error: the output that optimizes the search criterion does not optimize accuracy

Searching Probable Outputs

Greedy Search • One by one, pick the single highest-probability word while y j-1 != “</s>”: y j = argmax P(y j | X, y 1 , …, y j-1 ) • Not exact, real problems: • Will often generate the “easy” words first • Will prefer multiple common words to one rare word

Why will this Help Next word P(next word) Pittsburgh 0.4 New York 0.3 New Jersey 0.25 Other 0.05

Beam Search • Instead of picking the highest probability/score, maintain multiple paths • At each time step • Expand each path • Choose a subset paths from the expanded set

Basic Pruning Methods (Steinbiss et al. 1994) • How to select which paths to keep expanding? • Histogram Pruning: Keep exactly k hypotheses at every time step • Score Threshold Pruning: Keep all hypotheses where score is within a threshold α of best score s 1   s n + α > s 1 • Probability Mass Pruning: Keep all hypotheses up until probability mass α

What beam size should I use? • Larger beam sizes will be slower • May not give better results due to model errors • Sometimes result in shorter sequences • May favor high-frequency words • Mostly done empirically -> experiment (range of 5-100 for histogram?)

    Problems w/ Disparate Search Difficulty • Sometimes need to cover specific content, some easy some hard   I saw the escarpment watashi mita dangai? zeppeki? kyushamen? iwa? • Can cause the search algorithm to select the easy thing first, then hard thing later watashi wa dangai wo mita watashi ga mita dangai (I saw the escarpment) (the escarpment I saw)

Future Cost • also predict how hard it will be to process as-of-yet- unprocessed words, and search for maximum of sum f(n) = g(n) + h(n) • g(n): cost to current point • h(n): estimated cost to goal • See Koehn (2010 Chapter 6), or Li et al. (2017) for a neural approximation

Search and Problems with Modeling

Better Search can Hurt Results! (Koehn and Knowles 2017) • Better search (=better model score) can result in worse BLEU score! • Why? Model errors!

How to Fix Model Errors? • Train the model to maximize accuracy/minimize risk (best!, covered previously) • Change the decision rule to minimize risk (best!) • Heuristically modify the model score post-hoc (OK) • Hobble the search algorithm so it makes more search errors, but the kind of errors you want (meh)

Minimum Bayes Risk Decoding

Advanced Search Algorithms Graham Neubig - PowerPoint PPT Presentation

CS11-747 Neural Networks for NLP Advanced Search Algorithms Graham Neubig https://phontron.com/class/nn4nlp2020/ (Some Slides by Daniel Clothiaux) The Generation Problem We have a model of P(Y|X), how do we use it to generate a sentence?

Search Algorithms 3 AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 1 3 Search Algorithms

Informed search algorithms Outline Best-first search Greedy best-first search A *

Search Problems and Algorithms T79.4201 Search Problems and Algorithms (4 ECTS) T-79.4201

Local search algorithms AIMA sections 4.1,4.2 Summary Local search algorithms Hill-climbing

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Search 3 AI Slides (5e) c Lin Zuoquan@PKU 2003-2019 3 1 3 Search 3.1 Problem-solving

4 Local Search For realistic problems, complete search trees can be extremely large Local search

10.1 Blind Search 8.12. Basic Algorithms 8. Data Structures for Search Algorithms 9.

CHAPTER 3: CLASSICAL SEARCH CHAPTER 3: CLASSICAL SEARCH ALGORITHMS ALGORITHMS DIT411/TIN175,

CHAPTERS 34: MORE SEARCH CHAPTERS 34: MORE SEARCH ALGORITHMS ALGORITHMS DIT411/TIN175,

CS 310 Advanced Data Structures and Algorithms Binary Search Tree June 19, 2018 Mohammad

Advanced Algorithms (I) Chihao Zhang Shanghai Jiao Tong University Feb. 25, 2019 Advanced

Elastic Search - Aditi Choksi (EW18455) Elastic Search Search engine Distributed

Smart Lifelog Retrieval System with Habit-based Concepts and Moment Visualization QUIK team

Luo Si Department of Computer Science Purdue University Retrieval Models Information Need

Text is everywhere We use documents as primary information artifact in our lives Our access to

SErAPIS: A Concept-Oriented Search Engine for the Isabelle Libraries Based on Natural Language

Final result of the MEG experiment and prospects on e searches Cecilia Voena INFN Roma

Adaptive Bulk Search: Solving Quadratic Unconstrained Binary Optimization Problems on Multiple

Outline read Chapter suggested exercises

Improving web search with FCA Radim BELOHLAVEK Jan OUTRATA Dept. Systems Science and Industrial