foundations of language science and technology
play

Foundations of Language Science and Technology: Statistical - PowerPoint PPT Presentation

Foundations of Language Science and Technology: Statistical Language Models Dietrich Klakow Using Language Models 2 How Speech Recognition works Speech Signal Feature Extraction Feature Extraction Acoustic Model Stream of feature P(A|W)


  1. Foundations of Language Science and Technology: Statistical Language Models Dietrich Klakow

  2. Using Language Models 2

  3. How Speech Recognition works Speech Signal Feature Extraction Feature Extraction Acoustic Model Stream of feature P(A|W) vectors A Search ^ W=argmax [P(A|W) P(W)] Language Model Language Model [W] P(W) P(W) ^ ^ Recognized word sequence W 3

  4. Guess the next word What‘s in your hometown newspaper ??? 4

  5. Guess the next word What‘s in your hometown newspaper today 5

  6. Guess the next word It‘s raining cats and ??? 6

  7. Guess the next word It‘s raining cats and dogs 7

  8. Guess the next word President Bill ??? 8

  9. Guess the next word President Bill Gates 9

  10. Information Retrieval • Language model introduced to information retrieval in 1998 by Ponte&Croft Query D 1 D 7 D 3 Q D 6 D 2 D 4 P(Q|D 2 ) D 5 Ranking according to P(Q|D i ) 10

  11. Measuring the Quality of Language Models 11

  12. Definition of Perplexity − 1 / N = PP P ( w ... w ) 1 N  −  1 ∑   ( )  = exp N ( w , h ) log P ( w | h )    N w , h P(w|h): language model N(w,h): frequency of sequence w,h in some test corpus N: size of test corpus 12

  13. Interpretation Calculate perplexity of uniform distribution (white board) 13

  14. Perplexity and Word Error Rate Perplexity and error rate are correlate within error bars 14

  15. Estimating the Parameters of a Language Model 15

  16. Goal • Minimize perplexity on training data  −  1 ∑   ( )  = PP exp N ( w , h ) log P ( w | h )  Train   N w , h Train 16

  17. Define likelihood L=-log (PP) 1 ∑ ( ) = L N ( w , h ) log P ( w | h ) Train N w , h Train Minimizing perplexity How to take normalization � constraint into account? maximizing likelihood 17

  18. Calculating the maximum likelihood estimate (white board) 18

  19. Maximum likelihood estimator N ( w , h ) Train = P ( w | h ) N ( h ) Train What´s the problem? 19

  20. Backing-off and Smoothing 20

  21. Absolute Discounting • See white board 21

  22. Influence of Discounting Parameter 22

  23. Possible further Improvements 23

  24. Linear Smoothing N ( w w ) − Train 1 0 = λ ( | ) P w w − 0 1 1 N ( w ) − Train 1 N ( w ) Train 0 + λ 2 N Train 1 + − λ − λ ( 1 ) 1 2 V V: size of vocabulary 24

  25. Marginal Backing-Off (Kneser-Ney-Smoothing) • Dedicated backing-off distributions • Usually about 10% to 20% reduction in perplexity 25

  26. Class Language Models • Automatically group words into classes • Map all words in the language model to classes • Dramatic reduction in number of parameters to estimate • Usually used in linear with word language model 26

  27. Summary • How to build a state-of-the art plain vanilla language model: • Trigram • Absolute discounting • Marginal backing-off (Kneser-Ney smoothing) • Linear interpolation with class model 27

Recommend


More recommend