Applications (2 of 2): Applications (2 of 2): Recognition, Transduction, Discrimination, Segmentation Alignment etc Segmentation, Alignment, etc. Kenneth Church Kenneth.Church@jhu.edu Kenneth.Church@jhu.edu Dec 9, 2009 1
Solitaire � Multiplayer Games: Auctions (Ads) http://www.scienceoftheweb.org/15 ‐ 396/lectures/lecture09.pdf Right Rail Right Rail: Avoid distortions from Mainline Ad commercial interests Dec 9, 2009 2
A Single Auction � A Stream of Continuous Auctions g • Standard Example of Second Price Auction Sta da d a p e o Seco d ce uct o – Single Auction for a Single Apple • Theoretical Result – Second Price Auction � Truth Telling – http://en.wikipedia.org/wiki/Vickrey_auction – Optimal Strategy: • Bid what the apple is worth to you • Don’t worry about what it is worth to others • Don t worry about what it is worth to others – First Price Auction � Truth Telling • Does theory generalize to a continuous stream? Does theory generalize to a continuous stream? Dec 9, 2009 3
Pricing: Cost Per Click (CPC) Pricing: Cost Per Click (CPC) • Equilibrium • B i = your bid i y – Advertisers • B i+1 = next bid • Awareness • CTR i = your click through rate • Sales Sa es • CTR i+1 = next click through rate • New Customers • ROI • CPC i = your price – Users Users – (if we show your ad and user clicks) (if h d d li k ) • Minimize pain • Improvement: CTR � Q (Prior) • Obtain Value • Single Auction: • Single Auction: – Market Maker Market Maker • Maximize – CPC i = B i+1 Revenue • Continuous Stream: • Continuous Stream: • Truth Telling? – CPC i = B i+1 CTR i+1 / CTR i Dec 9, 2009 4
Multi ‐ Player Games � Many Technical Opportunities Many Technical Opportunities • Economics – http://www.wired.com/culture/culturereviews/magazine/17 ‐ 06/nep_googlenomics?currentPage=all • Machine Learning – Learning to Rank – Estimate CTR (Q/Priors) Estimate CTR (Q/Priors) – Sparse Data: • What is the CTR for a new ad? – Errors can be expensive – Errors can be expensive • If CTR is too low for new ad � Penalize Growth • If too high � Reward Bad Guys to do Bad Things • Truth Telling for Continuous Auctions? • Truth Telling for Continuous Auctions? – Probably not, especially if participants can estimate Q better than market maker • Machine Learning: Solitaire � Multi ‐ Player Games – Can I estimate Q better than you can? Man ‐ eating tiger Dec 9, 2009 5
Applications Applications • Recognition: Shannon’s Noisy Channel Model – Speech Optical Character Recognition (OCR) Spelling Speech, Optical Character Recognition (OCR), Spelling • Transduction – Part of Speech (POS) Tagging – Machine Translation (MT) • • Parsing ??? Parsing: ??? Ranking • – Information Retrieval (IR) – Lexicography • Di Discrimination: i i i – Sentiment, Text Classification, Author Identification, Word Sense Disambiguation (WSD) • Segmentation – Asian Morphology (Word Breaking), Text Tiling • Alignment: Bilingual Corpora, Dotplots • Compression Language Modeling: good for everything • Dec 9, 2009 6
Speech � Language Shannon’s: Noisy Channel Model Shannon s: Noisy Channel Model Channel Language Model • I � Noisy Channel � O Model y • I ΄ ≈ ARGMAX I Pr ( I|O ) = ARGMAX I Pr ( I ) Pr ( O|I ) Application Trigram Language Model Independent Independent Word Rank More likely alternatives The This One Two A Three Channel Model We 9 Please In need need 7 7 are will the would also do are will the would also do Application Application Input Input Output Output to 1 Speech Recognition writer rider resolve 85 have know do… OCR (Optical The This One Two A Three all all 9 9 C Character all a1l Please In Recognition) The This One Two A Three of 2 Please In Spelling Correction government goverment the the 1 1 important 657 document question first… issues 14 thing point to Dec 9, 2009 7
Speech � Language Using (Abusing) Shannon’s Noisy Channel Model: Part of g ( g) y Speech Tagging and Machine Translation • Speech p – Words � Noisy Channel � Acoustics • OCR – Words � Noisy Channel � Optics • Spelling Correction – Words � Noisy Channel � Typos d � N i l � T W Ch • Part of Speech Tagging (POS): – POS � Noisy Channel � Words – POS � Noisy Channel � Words • Machine Translation: “Made in America” – English � Noisy Channel � French g y Didn’t have the guts to use this slide at Eurospeech (Geneva) Dec 9, 2009 8
Dec 9, 2009 9
Spelling Correction Dec 9, 2009 10
Dec 9, 2009 11
Dec 9, 2009 12
Dec 9, 2009 13
Dec 9, 2009 14
Evaluation Evaluation Dec 9, 2009 15
Performance Performance Dec 9, 2009 16
The Task is Hard without Context The Task is Hard without Context Dec 9, 2009 17
Easier with Context Easier with Context • actuall actual actually actuall, actual, actually – … in determining whether the defendant actually will die will die. • constuming, consuming, costuming • conviced, convicted, convinced i d i t d i d • confusin, confusing, confusion • workern, worker, workers Dec 9, 2009 18
Easier with Context Dec 9, 2009 19
Context Model Context Model Dec 9, 2009 20
Dec 9, 2009 21
Dec 9, 2009 22
Dec 9, 2009 23
Dec 9, 2009 24
Future Improvements Future Improvements • Add More Factors Add More Factors – Trigrams – Thesaurus Relations Thesaurus Relations – Morphology – Syntactic Agreement S t ti A t – Parts of Speech • Improve Combination Rules b l – Shrink (Meaty Methodology) Dec 9, 2009 25
Dec 9, 2009 26
Conclusion (Spelling Correction) Conclusion (Spelling Correction) • There has been a lot of interest in smoothing There has been a lot of interest in smoothing – Good ‐ Turing estimation – Knesser ‐ Ney Knesser Ney • Is it worth the trouble? • Ans: Yes (at least for recognition applications) Dec 9, 2009 27
Dec 9, 2009 28
Dec 9, 2009 29
Dec 9, 2009 30
Dec 9, 2009 31
Dec 9, 2009 32
Dec 9, 2009 33
Dec 9, 2009 34
Dec 9, 2009 35
Dec 9, 2009 36
Dec 9, 2009 37
Dec 9, 2009 38
Dec 9, 2009 39
Dec 9, 2009 40
Dec 9, 2009 41
Dec 9, 2009 42
Dec 9, 2009 43
Dec 9, 2009 44
Dec 9, 2009 45
Dec 9, 2009 46
Dec 9, 2009 47
Dec 9, 2009 48
Dec 9, 2009 49
Dec 9, 2009 50
Aligning Words Aligning Words Dec 9, 2009 51
Dec 9, 2009 52
Dec 9, 2009 53
Dec 9, 2009 54
Dec 9, 2009 55
Dec 9, 2009 56
Dec 9, 2009 57
Dec 9, 2009 58
Dec 9, 2009 59
Dec 9, 2009 60
Dec 9, 2009 61
Dec 9, 2009 62
Dec 9, 2009 63
Dec 9, 2009 64
Dec 9, 2009 65
Dec 9, 2009 66
Dec 9, 2009 67
Dec 9, 2009 68
Recommend
More recommend