Foundations of Language Science and Technology: Statistical - PowerPoint PPT Presentation

Mar 22, 2023 •315 likes •604 views

Foundations of Language Science and Technology: Statistical Language Models Dietrich Klakow Using Language Models 2 How Speech Recognition works Speech Signal Feature Extraction Feature Extraction Acoustic Model Stream of feature P(A|W)

Foundations of Language Science and Technology: Statistical Language Models Dietrich Klakow
Using Language Models 2
How Speech Recognition works Speech Signal Feature Extraction Feature Extraction Acoustic Model Stream of feature P(A|W) vectors A Search ^ W=argmax [P(A|W) P(W)] Language Model Language Model [W] P(W) P(W) ^ ^ Recognized word sequence W 3
Guess the next word What‘s in your hometown newspaper ??? 4
Guess the next word What‘s in your hometown newspaper today 5
Guess the next word It‘s raining cats and ??? 6
Guess the next word It‘s raining cats and dogs 7
Guess the next word President Bill ??? 8
Guess the next word President Bill Gates 9
Information Retrieval • Language model introduced to information retrieval in 1998 by Ponte&Croft Query D 1 D 7 D 3 Q D 6 D 2 D 4 P(Q|D 2 ) D 5 Ranking according to P(Q|D i ) 10
Measuring the Quality of Language Models 11
Definition of Perplexity − 1 / N = PP P ( w ... w ) 1 N  −  1 ∑   ( )  = exp N ( w , h ) log P ( w | h )    N w , h P(w|h): language model N(w,h): frequency of sequence w,h in some test corpus N: size of test corpus 12
Interpretation Calculate perplexity of uniform distribution (white board) 13
Perplexity and Word Error Rate Perplexity and error rate are correlate within error bars 14
Estimating the Parameters of a Language Model 15
Goal • Minimize perplexity on training data  −  1 ∑   ( )  = PP exp N ( w , h ) log P ( w | h )  Train   N w , h Train 16
Define likelihood L=-log (PP) 1 ∑ ( ) = L N ( w , h ) log P ( w | h ) Train N w , h Train Minimizing perplexity How to take normalization � constraint into account? maximizing likelihood 17
Calculating the maximum likelihood estimate (white board) 18
Maximum likelihood estimator N ( w , h ) Train = P ( w | h ) N ( h ) Train What´s the problem? 19
Backing-off and Smoothing 20
Absolute Discounting • See white board 21
Influence of Discounting Parameter 22
Possible further Improvements 23
Linear Smoothing N ( w w ) − Train 1 0 = λ ( | ) P w w − 0 1 1 N ( w ) − Train 1 N ( w ) Train 0 + λ 2 N Train 1 + − λ − λ ( 1 ) 1 2 V V: size of vocabulary 24
Marginal Backing-Off (Kneser-Ney-Smoothing) • Dedicated backing-off distributions • Usually about 10% to 20% reduction in perplexity 25
Class Language Models • Automatically group words into classes • Map all words in the language model to classes • Dramatic reduction in number of parameters to estimate • Usually used in linear with word language model 26
Summary • How to build a state-of-the art plain vanilla language model: • Trigram • Absolute discounting • Marginal backing-off (Kneser-Ney smoothing) • Linear interpolation with class model 27

Recommend

Cognitive Foundations Lecture 2: Experimental Methods (2) Foundations of Language Science and

Cognitive Foundations Lecture 2: Experimental Methods (2) Foundations of Language Science and Technology Garance P ARIS 12 November 2008 2 Review (1): The Miracle Garance P ARIS Foundations of Language Science and Technology 12 November 2008

673 views • 28 slides

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering Example: Spam

456 views • 35 slides

recap to this point foundations foundations foundations foundations genetics =

Empirical Empirical Unanswered questions Unanswered questions recap to this point foundations foundations foundations foundations genetics = genetics = genetics = genetics = change & time change & time

490 views • 8 slides

Language Science & Technology: Language Science & Technology: Linguistic Foundations

Language Science & Technology: Language Science & Technology: Linguistic Foundations Linguistic Foundations WS 2007-2008 (14.11.2007 & 16.11.2007) PD Dr. Tania Avgustinova avgustin avgustinova va @ coli coli . u uni i

565 views • 53 slides

Natural Language Parsing Techonlogy Foundations of Language Science and Technology (WS 2014/2015)

Natural Language Parsing Techonlogy Foundations of Language Science and Technology (WS 2014/2015) Bernd Kiefer Language Technology Lab, DFKI GmbH Department of Computational Linguistics Saarland University November 2014 1 Natural Language

885 views • 60 slides

Foundations of Language Science and Technology: Morphology Berthold Crysmann crysmann@dfki.de

Foundations of Language Science and Technology: Morphology Berthold Crysmann crysmann@dfki.de Source: Berthold Crysmann 2006 Foundations of Language Science and Technology Overview Basic terminology Subdomains of morphology:

735 views • 51 slides

Outline Foundations of Data and Knowledge Systems EPCL Basic Training Camp 2012 3. Foundations

Foundations of DKS Foundations of DKS Outline Foundations of Data and Knowledge Systems EPCL Basic Training Camp 2012 3. Foundations of Automated Theorem Proving Part Two 3.1 Substitutions and Unification 3.2 Transformation into Clause Form

540 views • 18 slides

BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD CLASS BUILDING THE

BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD CLASS BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD CLASS CLASS CLASS MINING PROJECT MINING PROJECT Specialty Metals for a Greener World

363 views • 33 slides

For personal use only BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD

For personal use only BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD CLASS BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD CLASS CLASS CLASS MINING PROJECT MINING PROJECT Specialty Metals

459 views • 27 slides

For personal use only BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD

488 views • 34 slides

Foundations of Language Science and Technology (FLST) Lecture 3 (19.10.2009) PD Dr.Valia Kordoni

Foundations of Language Science and Technology (FLST) Lecture 3 (19.10.2009) PD Dr.Valia Kordoni Email: kordoni@coli.uni-sb.de http://www.coli.uni-saarland.de/courses/FLST/2009/ Linguistic Foundations Language Ambiguity: a Curse and a

382 views • 27 slides

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Language and Language and Language and Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6: CALL Language learning Language learning Language learning First language aquisition First language

342 views • 6 slides

Language Technology and the Language Technology and the Semantic Web Semantic Web Dr. Gnter

Language Technology and the Language Technology and the Semantic Web Semantic Web Dr. Gnter Neumann Dr. Gnter Neumann http://www.dfki dfki.de/~neumann .de/~neumann http://www. Language Technology- Language Technology -Lab Lab DFKI,

640 views • 26 slides

COL866: Foundations of Data Science Ragesh Jaiswal, IITD Ragesh Jaiswal, IITD COL866:

COL866: Foundations of Data Science Ragesh Jaiswal, IITD Ragesh Jaiswal, IITD COL866: Foundations of Data Science Ranking and Social Choice Ragesh Jaiswal, IITD COL866: Foundations of Data Science Ranking and Social Choice Problem: Merge

8.09k views • 26 slides

Foundations of Pharmaceutical Science Foundations of Pharmaceutical Science (Hass, Voigt, Balaz)

Foundations of Pharmaceutical Science Foundations of Pharmaceutical Science (Hass, Voigt, Balaz) (Voigt) (Hass, Cen) Medicinal Chemistry Discipline of chemistry focused on the influence of chemical structure on the delivery and

699 views • 57 slides

COL866: Foundations of Data Science Ragesh Jaiswal, IITD Ragesh Jaiswal, IITD COL866:

COL866: Foundations of Data Science Ragesh Jaiswal, IITD Ragesh Jaiswal, IITD COL866: Foundations of Data Science Best Fit Subspaces and Singular Value Decomposition (SVD) Ragesh Jaiswal, IITD COL866: Foundations of Data Science Best Fit

927 views • 33 slides

Sensors, Infrastructures, Innova0on: Oil and Gas Eric Monteiro

Sensors, Infrastructures, Innova0on: Oil and Gas Eric Monteiro www.idi.ntnu.no/~ericm Digital Oil project, www.doil.no INF 5210 Context History Just

787 views • 46 slides

Quantile regression: Basics and recent advances J. M.C. Santos Silva University of Surrey 2019

Quantile regression: Basics and recent advances J. M.C. Santos Silva University of Surrey 2019 UK Stata Conference 06/09/19 1 1. Summary Quantile regression (Koenker and Bassett, 1978) is increasingly used by practitioners but it is still

544 views • 25 slides

t sr Pt r ts

t sr Pt r ts s s rt s s s rts

379 views • 6 slides

Hi-Lumi LHC/LARP Conductor and Cable Internal Review Past and Present Manufacture and

Hi-Lumi LHC/LARP Conductor and Cable Internal Review Past and Present Manufacture and Performance of PIT and RRP 132/169 A. Ballarino, CERN A. Ballarino, L. Oberli, B. Bordini, D. Richter and L. Bottura Conductor specification and

502 views • 21 slides

Graph Search Lecture 15 October 18, 2018 Chandra Chekuri (UIUC) CS/ECE 374 1 Fall 2018 1 /

CS/ECE 374: Algorithms & Models of Computation, Fall 2018 Graph Search Lecture 15 October 18, 2018 Chandra Chekuri (UIUC) CS/ECE 374 1 Fall 2018 1 / 45 Part I Graph Basics Chandra Chekuri (UIUC) CS/ECE 374 2 Fall 2018 2 / 45 Why

1.09k views • 71 slides

Deep Hedging Josef Teichmann ETH Z urich New York, May 2020 Josef Teichmann (ETH Z urich)

Deep Hedging Josef Teichmann ETH Z urich New York, May 2020 Josef Teichmann (ETH Z urich) Deep Hedging New York, May 2020 1 / 31 Introduction 1 Instances of the abstract GAN problem 2 Path functionals and Reservoir computing 3

921 views • 61 slides

Notes Shallow water equations To recap, using eta for depth=h+H: D Dt = u

Notes Shallow water equations To recap, using eta for depth=h+H: D Dt = u Du Dt = g h We re currently working on the advection (material derivative) part cs533d-term1-2005 1 cs533d-term1-2005 2

388 views • 10 slides

1 100 120 140 160 20 40 60 80 0 1880 1883 1886 1889 1892 1895 1898 1901 1904

1 100 120 140 160 20 40 60 80 0 1880 1883 1886 1889 1892 1895 1898 1901 1904 Publi 1907 1910 1913 WWI lic De 1916 1919 Debt-to 1922 Advanced Economies: Depression 1925 1928 Great 1931 to-GDP Ra 1934 1937 1940

1.11k views • 38 slides

Foundations of Language Science and Technology: Statistical - PowerPoint PPT Presentation

Foundations of Language Science and Technology: Statistical Language Models Dietrich Klakow Using Language Models 2 How Speech Recognition works Speech Signal Feature Extraction Feature Extraction Acoustic Model Stream of feature P(A|W)

Cognitive Foundations Lecture 2: Experimental Methods (2) Foundations of Language Science and

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

recap to this point foundations foundations foundations foundations genetics =

Language Science &amp; Technology: Language Science &amp; Technology: Linguistic Foundations

Natural Language Parsing Techonlogy Foundations of Language Science and Technology (WS 2014/2015)

Foundations of Language Science and Technology: Morphology Berthold Crysmann crysmann@dfki.de

Outline Foundations of Data and Knowledge Systems EPCL Basic Training Camp 2012 3. Foundations

BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD CLASS BUILDING THE

For personal use only BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD

For personal use only BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD

Foundations of Language Science and Technology (FLST) Lecture 3 (19.10.2009) PD Dr.Valia Kordoni

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Language Technology and the Language Technology and the Semantic Web Semantic Web Dr. Gnter

COL866: Foundations of Data Science Ragesh Jaiswal, IITD Ragesh Jaiswal, IITD COL866:

Foundations of Pharmaceutical Science Foundations of Pharmaceutical Science (Hass, Voigt, Balaz)

COL866: Foundations of Data Science Ragesh Jaiswal, IITD Ragesh Jaiswal, IITD COL866:

Sensors, Infrastructures, Innova0on: Oil and Gas Eric Monteiro

Quantile regression: Basics and recent advances J. M.C. Santos Silva University of Surrey 2019

t sr Pt r ts

Hi-Lumi LHC/LARP Conductor and Cable Internal Review Past and Present Manufacture and

Graph Search Lecture 15 October 18, 2018 Chandra Chekuri (UIUC) CS/ECE 374 1 Fall 2018 1 /

Deep Hedging Josef Teichmann ETH Z urich New York, May 2020 Josef Teichmann (ETH Z urich)

Notes Shallow water equations To recap, using eta for depth=h+H: D Dt = u

1 100 120 140 160 20 40 60 80 0 1880 1883 1886 1889 1892 1895 1898 1901 1904

Language Science & Technology: Language Science & Technology: Linguistic Foundations