Using Dependency Grammar Features in Whole Sentence Maximum Entropy - PowerPoint PPT Presentation

Using Dependency Grammar Features in Whole Sentence Maximum Entropy Language Model for Speech Recognition Teemu Ruokolainen, Tanel Alum¨ ae, Marcus Dobrinkat October 8th, 2010 1 / 16

Contents Whole sentence language modeling Dependency Grammar Whole Sentence Maximum Entropy Language Model Experiments Conclusions 2 / 16

Whole sentence language modeling Statistical sentence modeling problem ◮ Given a finite set of observed sentences, learn a model which gives useful probability estimates for arbitrary new sentences n-gram model: the standard approach ◮ Model language as a high-order Markov Chain; current word is dependent only on n − 1 of its preceeding words ◮ Sentence probability is obtained using chain rule; sentence probability is product of word probabilities ◮ Modeling is based on local dependencies of the language only; grammatical regularities learned by the model will be captured implicitly within the short word windows 3 / 16

Example: n-gram succeeds ◮ Stock markets fell yesterday. ◮ Log probability given by trigram LM = -19.39 ◮ Stock markets fallen yesterday. ◮ Log probability = -21.26 4 / 16

Example: n-gram fails ◮ Stocks have by and large fallen . ◮ Log probability = -19.92 ◮ Stocks have by and large fell . ◮ Log probability = -18.82 5 / 16

Our aim ◮ Explicit modeling of grammatical knowledge over whole sentence ◮ Dependency Grammar Features ◮ Whole Sentence Maximum Entropy Language Model (WSME LM) ◮ Experiments in a large vocabulary speech recognition task 6 / 16

Dependency Grammar ◮ Dependency parsing results in head-modifier relations between pairs of words, together with the labels of the relationships ◮ The labels describe the type of the relation, e.g. subject, object, negate ◮ These asymmetric bilexical relations define a complete dependency structure for the sentence V-CH OBJ NEG DAT SUBS I will not buy Quebecers’ votes . 7 / 16

Extracting Dependency Grammar Features ◮ Dependencies are converted into binary features ◮ Feature is or is not present in a sentence ◮ Dependency bigram features contain a relationship between a head and a modifier ◮ Dependency trigram features contain a modifier with its head and the head’s head OBJ SUBS V-CH votes I will buy buy 8 / 16

Whole Sentence Maximum Entropy Language Model (WSME LM) Principle of Maximum Entropy ◮ Model selection criterion ◮ From all the probability distributions satisfying known constraints, choose the one with the highest entropy Maximum Entropy Model ◮ Constraints: expected values of features ◮ Form of the model satisfying the constraints: exponential distribution ◮ Within the exponential model family: maximum likelihood solution is the maximum entropy solution 9 / 16

WSME LM ◮ WSME LM is the exponential probability distribution over sentences which is closest to the background n-gram model (in Kullback-Leibler divergence sense) while satisfying linear constraints specified by empirical expectations of features ◮ For uniform background model, the Maximum Entropy solution ◮ For testing data, the sentence probabilities given by the n-gram model are, effectively, scaled according to the features present in the sentence. Practical issues ◮ Training WSME LM requires sentence samples from the exponential model ◮ Markov Chain Monte Carlo sampling methods 10 / 16

Experiments Experiment setup ◮ Train a baseline n-gram LM and WSME LM ◮ Obtain an N-best hypothesis list for a sentence from speech recognizer using the baseline n-gram and rescore them using WSME LM ◮ Compare model performance with speech transcript perplexity and speech recognition word error rate (WER) 11 / 16

Data ◮ Textual training corpus: Gigaword ◮ English newswire articles of typical daily news topics; sports, politics, finances, etc. ◮ 1M sentences (20M words) ◮ Small subset of Gigaword ◮ Speech test corpus: Wall Street Journal ◮ Dictated English financial newswire articles ◮ 329 sentences (11K words) Baseline LM ◮ Trigram model trained using Kneser-Ney smoothing ◮ Vocabulary size: 60K words 12 / 16

Dependency parsing ◮ Textual data was parsed using a freely distributed Connexor Machine Syntax parser WSME LM training ◮ Sentence samples from the exponential model were obtained using importance sampling ◮ The L-BGFS algorithm was used for optimizing the parameters ◮ The parameters of the model were smoothed using Gaussian priors Speech recognition system ◮ Large vocabulary speech recognizer developed at the Department of Information and Computer Science, Aalto University 13 / 16

Experiment results ◮ We observe a 19% relative decline in perplexity (PPL) when using the WSME LM compared to baseline trigram ◮ The WER drops by 6.1% relative (1.8% absolute) compared to the baseline ◮ Note: Results reported only for trigram Dependency Grammar features ◮ Performance gain is significant Table: Perplexity (PPL) and word error rate (WER) when using different language models. Language model PPL WER Word trigram 303 29.6 WSME LM 244 30.6 Word trigram + WSME LM 255 27.9 14 / 16

Conclusions ◮ We described our experiments with WSME LM using binary features extracted with a dependency grammar parser ◮ The dependency features were in the form of labeled asymmetric bilexical relations ◮ Experiments on bigram and trigram features ◮ The WSME LM was evaluated in a large vocabulary speech recognition 15 / 16

Conclusions (continued) ◮ We obtained significant improvement in performance using WSMELM compared to a baseline word trigram ◮ WSME LMs provide an elegant way to combine statistical models with linguistic information ◮ The main shortcoming of the method; extremely high memory consumption requirement during training of the model 16 / 16

Using Dependency Grammar Features in Whole Sentence Maximum Entropy - PowerPoint PPT Presentation

Using Dependency Grammar Features in Whole Sentence Maximum Entropy Language Model for Speech Recognition Teemu Ruokolainen, Tanel Alum ae, Marcus Dobrinkat October 8th, 2010 1 / 16 Contents Whole sentence language modeling Dependency

Working Together What does his future hold? Carres Grammar School Carres Grammar School

Dependency Grammar Introduction to Dependency Grammar Not a coherent grammatical framework:

Dependency Grammar Thanks to Detmar Meurers, Markus Dickinson, Joakim Nivre and Sandra K

Dependency Grammar Overview Dependency Grammar (DG) (1) Small birds sing loud songs Not a

Extensible Dependency Grammar: A Modular Grammar Formalism Based On Multigraph Description Ralph

Grammar and word order Grammar and word order Grammar Grammar Includes morphology and syntax

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Dependency Grammars Topological Dependency Trees: A Constraint-based Account of Linear

Lecture 17: Dependency Grammar Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

A Sentence is a Sentence is a Sentence? Zarah Weiss Introduction Parallels and Differences

SENTENCE STRUCTURE ATI TEAS ENGLISH AND LANGUAGE USAGE SENTENCE STRUCTURE Sentence Structure

Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing

Dependency Parsing Diyi Yang Presenting: Yuval Pinter (uvp@) Representing Sentence Structure

Graph Based Dependency Parsing Wei Qiu December 15, 2011 . . . . . . Graph Based

Lecture 19: Dependency Grammars and Dependency Parsing Julia Hockenmaier juliahmr@illinois.edu

Dependency Grammars and Parsing CMSC 473/673 UMBC Outline Review: PCFGs and CKY Dependency

Aspects of topicality in the use of demonstratives expressions in German and Russian Olga

Current Topics in Bitcoin 2018-01-18 Jonas Nick jonasd.nick@gmail.com https://nickler.ninja

Electricity markets material GEOS 24705 / ENST 24705 / ENSC21100 History of electricity sector

Towards Blockchain-based Auditable Storage & Secure Sharing of IoT Data Hossein Shafagh ,

Cu and Cd sulphides for photovoltaic applications Tommaso Baroni*, Francesco Di Benedetto, Andrea

The Light Channel of the CRESST Experiment Anja Tanzke Max-Planck-Institute for Physics

BGV MC Digitization Progress since BGV #27 Plamen Hopchev CERN BE-BI-BL BGV meeting #29 12 Mar

Early searches for supersymmetry at the LHC in the all-hadronic channel Tom Whyntie Imperial

Using Dependency Grammar Features in Whole Sentence Maximum Entropy - PowerPoint PPT Presentation

Using Dependency Grammar Features in Whole Sentence Maximum Entropy Language Model for Speech Recognition Teemu Ruokolainen, Tanel Alum ae, Marcus Dobrinkat October 8th, 2010 1 / 16 Contents Whole sentence language modeling Dependency

Working Together What does his future hold? Carres Grammar School Carres Grammar School

Dependency Grammar Introduction to Dependency Grammar Not a coherent grammatical framework:

Dependency Grammar Thanks to Detmar Meurers, Markus Dickinson, Joakim Nivre and Sandra K

Dependency Grammar Overview Dependency Grammar (DG) (1) Small birds sing loud songs Not a

Extensible Dependency Grammar: A Modular Grammar Formalism Based On Multigraph Description Ralph

Grammar and word order Grammar and word order Grammar Grammar Includes morphology and syntax

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Dependency Grammars Topological Dependency Trees: A Constraint-based Account of Linear

Lecture 17: Dependency Grammar Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

A Sentence is a Sentence is a Sentence? Zarah Weiss Introduction Parallels and Differences

SENTENCE STRUCTURE ATI TEAS ENGLISH AND LANGUAGE USAGE SENTENCE STRUCTURE Sentence Structure

Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing

Dependency Parsing Diyi Yang Presenting: Yuval Pinter (uvp@) Representing Sentence Structure

Graph Based Dependency Parsing Wei Qiu December 15, 2011 . . . . . . Graph Based

Lecture 19: Dependency Grammars and Dependency Parsing Julia Hockenmaier juliahmr@illinois.edu

Dependency Grammars and Parsing CMSC 473/673 UMBC Outline Review: PCFGs and CKY Dependency

Aspects of topicality in the use of demonstratives expressions in German and Russian Olga

Current Topics in Bitcoin 2018-01-18 Jonas Nick jonasd.nick@gmail.com https://nickler.ninja

Electricity markets material GEOS 24705 / ENST 24705 / ENSC21100 History of electricity sector

Towards Blockchain-based Auditable Storage &amp; Secure Sharing of IoT Data Hossein Shafagh ,

Cu and Cd sulphides for photovoltaic applications Tommaso Baroni*, Francesco Di Benedetto, Andrea

The Light Channel of the CRESST Experiment Anja Tanzke Max-Planck-Institute for Physics

BGV MC Digitization Progress since BGV #27 Plamen Hopchev CERN BE-BI-BL BGV meeting #29 12 Mar

Early searches for supersymmetry at the LHC in the all-hadronic channel Tom Whyntie Imperial

Towards Blockchain-based Auditable Storage & Secure Sharing of IoT Data Hossein Shafagh ,