why people care about spelling
play

Why people care about spelling Language and Detection vs. - PowerPoint PPT Presentation

Language and Why people care about spelling Language and Detection vs. Correction Language and Computers Computers Computers Topic 4: Topic 4: Topic 4: Writers Aids Writers Aids Writers Aids Introduction Introduction


  1. Language and Why people care about spelling Language and Detection vs. Correction Language and Computers Computers Computers Topic 4: Topic 4: Topic 4: Writer’s Aids Writer’s Aids Writer’s Aids Introduction Introduction Introduction ◮ People want to appear to be educated. Error causes Error causes Error causes ◮ There are two distinct tasks: Keyboard mistypings Keyboard mistypings Keyboard mistypings ◮ Misspellings can cause misunderstandings and real-life Phonetic errors Phonetic errors Phonetic errors Knowledge problems Knowledge problems Knowledge problems problems: ◮ error detection = simply find the misspelled words Language and Computers (Ling 384) Difficult issues Difficult issues Difficult issues ◮ error correction = correct the misspelled words ◮ For example: Tokenization Tokenization Tokenization Topic 4: Writer’s Aids (Spelling and Grammar Correction) Inflection Inflection Inflection ◮ Did you see her god yesterday? It’s a big golden Productivity Productivity ◮ e.g., It might be easy to tell that ater is a misspelled Productivity Non-word error retriever. Non-word error Non-word error word, but what is the correct word? water ? later ? after ? detection detection detection ◮ This will be a fee [free] concert. Adriane Boyd ∗ Dictionaries Dictionaries Dictionaries ⇒ Depends on what we want to do with our results as to N-gram analysis ◮ 1991 Bell Atlantic & Pacific Bell telephone network N-gram analysis N-gram analysis Department of Linguistics, OSU Isolated-word error Isolated-word error what we want to do. Isolated-word error outages were partly caused by a typographical error: correction correction correction Autumn 2005 Note, though, that detection is a prerequisite for Rule-based methods A 6 in a line of computer code was supposed to be a D . Rule-based methods Rule-based methods Similarity key techniques Similarity key techniques Similarity key techniques “That one error caused the equipment and software to correction. Probabilistic methods Probabilistic methods Probabilistic methods Minimum edit distance Minimum edit distance Minimum edit distance fail under an avalanche of computer-generated Grammar correction Grammar correction Grammar correction messages.” (Wall Street Journal, Nov. 25, 1991) Syntax Syntax Syntax Computing with Syntax Computing with Syntax Computing with Syntax ∗ The course was created by Markus Dickinson, Detmar Meurers and Chris Brew. Grammar correction rules Grammar correction rules Grammar correction rules Caveat emptor Caveat emptor Caveat emptor 1 / 72 4 / 72 7 / 72 Outline Language and Language and Language and Why people care about spelling (cont.) What causes errors? Computers Computers Computers Topic 4: Topic 4: Topic 4: Writer’s Aids Writer’s Aids Writer’s Aids Introduction Introduction Introduction Introduction ◮ Standard spelling makes it easy to organize words and Error causes Error causes Error causes text: Keyboard mistypings Keyboard mistypings Keyboard mistypings Error causes Phonetic errors Phonetic errors Phonetic errors ◮ e.g., Without standard spelling, how would you look up Knowledge problems Knowledge problems Knowledge problems Difficult issues things in a lexicon or thesaurus? Difficult issues Difficult issues Tokenization Tokenization ◮ Keyboard mistypings Tokenization Difficult issues ◮ e.g., Optical character recognition software can use Inflection Inflection Inflection Productivity Productivity Productivity knowledge about standard spelling to recognize ◮ Phonetic errors Non-word error Non-word error Non-word error scanned words even for barely legible input. Non-word error detection detection detection detection ◮ Knowledge problems Dictionaries Dictionaries Dictionaries ◮ Standard spelling makes it possible to provide a single N-gram analysis N-gram analysis N-gram analysis Isolated-word error Isolated-word error Isolated-word error text, which is accessible to a wide range of readers Isolated-word error correction correction correction correction (different backgrounds, speaking different dialects, etc.). Rule-based methods Rule-based methods Rule-based methods Similarity key techniques Similarity key techniques Similarity key techniques Probabilistic methods ◮ Using standard spelling is associated with being Probabilistic methods Probabilistic methods Grammar correction Minimum edit distance Minimum edit distance Minimum edit distance well-educated, i.e., is used to make a good impression Grammar correction Grammar correction Grammar correction Syntax Syntax Syntax in social interaction. Caveat emptor Computing with Syntax Computing with Syntax Computing with Syntax Grammar correction rules Grammar correction rules Grammar correction rules Caveat emptor Caveat emptor Caveat emptor 2 / 72 5 / 72 8 / 72 Language and Language and Language and Who cares about spelling? How are spell checkers used? Keyboard mistypings Computers Computers Computers Topic 4: Topic 4: Topic 4: Writer’s Aids Writer’s Aids Writer’s Aids Aoccdrnig to a rscheearch at Cmabrigde Introduction Introduction Introduction Uinervtisy, it deosn’t mttaer in waht oredr the ltteers ◮ interactive spelling checkers = spell checker detects Error causes Error causes Error causes in a wrod are, the olny iprmoetnt tihng is taht the Keyboard mistypings Keyboard mistypings Keyboard mistypings errors as you type. Space bar issues Phonetic errors Phonetic errors Phonetic errors frist and lsat ltteer be at the rghit pclae. The rset Knowledge problems Knowledge problems Knowledge problems ◮ It may or may not make suggestions for correction. ◮ run-on errors = two separate words become one Difficult issues Difficult issues Difficult issues can be a toatl mses and you can sitll raed it wouthit ◮ Requires a “real-time” response (i.e., must be fast) Tokenization Tokenization Tokenization porbelm. Tihs is bcuseae the huamn mnid deos not Inflection Inflection ◮ e.g., the fuzz becomes thefuzz Inflection ◮ It is up to the human to decide if the spell checker is Productivity Productivity Productivity raed ervey lteter by istlef, but the wrod as a wlohe. right or wrong. Non-word error Non-word error ◮ split errors = one word becomes two separate words Non-word error detection detection detection ◮ If there are a list of choices, we may not require 100% Dictionaries Dictionaries Dictionaries ◮ e.g., equalization becomes equali zation accuracy in the corrected word N-gram analysis N-gram analysis N-gram analysis (See http://www.mrc-cbu.cam.ac.uk/personal/matt.davis/Cmabrigde/ for Isolated-word error Isolated-word error Isolated-word error ◮ automatic spelling correctors = spell checker runs on Note that the resulting items might still be words! correction correction correction the story behind this supposed research report.) Rule-based methods Rule-based methods Rule-based methods a whole document, finds errors, and corrects them Similarity key techniques Similarity key techniques ◮ e.g., a tollway becomes atoll way Similarity key techniques Probabilistic methods Probabilistic methods Probabilistic methods ◮ A much more difficult task. Minimum edit distance Minimum edit distance Minimum edit distance A dootcr has aimttded the magltheuansr of a Grammar correction ◮ A human may or may not proofread the results later. Grammar correction Grammar correction tageene ceacnr pintaet who deid aetfr a hatospil Syntax Syntax Syntax Computing with Syntax Computing with Syntax Computing with Syntax durg blendur. Grammar correction rules Grammar correction rules Grammar correction rules Caveat emptor Caveat emptor Caveat emptor 3 / 72 6 / 72 9 / 72

Recommend


More recommend