Information Extraction Using the Structured Language Model Ciprian - PowerPoint PPT Presentation

Information Extraction Using the Structured Language Model Ciprian Chelba, Milind Mahajan � Information Extraction from Text � Structured Language Model (SLM) � SLM for Information Extraction � Experiments and Error Analysis � Conclusions and Future Directions Microsoft Research Speech.Net

Information Extraction from Text � Data driven approach with minimal annotation effort: clearly identifiable semantic slots and frames � Information extraction viewed as the recovery of a two level semantic parse S for a given word sequence W � Sentence independence assumption: the sentence W is sufficient for identifying the semantic parse S FRAME LEVEL Calendar Task Subject Person Time SLOT LEVEL Schedule meeting with Megan Hokins about internal lecture at two thirty p.m. Microsoft Research Speech.Net

Syntactic Parsing Using the Structured Language Model � Generalize trigram modeling (local) by taking advantage of sentence structure (influ- ence by more distant past) � Develop hidden syntactic structure T i for a given word prefix W i , with headword assignment � Assign a probability ( W ) P ; T i i ended_VP’ with_PP loss_NP of_PP contract_NP loss_NP cents_NP the_DT contract_NN ended_VBD with_IN a_DT loss_NN of_IN 7_CD cents_NNS after Microsoft Research Speech.Net

ended_VP’ with_PP loss_NP of_PP cents_NP contract_NP loss_NP ended_VBD with_IN a_DT loss_NN of_IN the_DT contract_NN 7_CD cents_NNS : : : ; null ; predict cents ; POStag cents ; adjoin-right-NP ; adjoin-left-PP ; : : : ; adjoin- left-VP’ ; null ; : ; : : Microsoft Research Speech.Net

ended_VP’ with_PP loss_NP of_PP cents_NP contract_NP loss_NP ended_VBD with_IN a_DT loss_NN of_IN _NNS the_DT contract_NN 7_CD cents : : : ; null ; predict cents ; POStag cents ; adjoin-right-NP ; adjoin-left-PP ; : : : ; adjoin- left-VP’ ; null ; : ; : :

predict word ended_VP’ PREDICTOR TAGGER with_PP loss_NP null tag word PARSER of_PP adjoin_{left,right} cents_NP contract_NP loss_NP the_DT contract_NN ended_VBD with_IN a_DT loss_NN of_IN 7_CD cents_NNS : ; null ; predict cents ; POStag cents ; adjoin-right-NP ; adjoin-left-PP ; : ; adjoin- : : : : left-VP’ ; null ; : ; : :

Word and Structure Generation ( T ) = P ; W n +1 n +1 n +1 Y ( w j h ) ( g j w ) ( T j w ) P ; h P ; h :tag ; h :tag P ; g ; T i � 2 � 1 i i � 1 � 2 i i i i � 1 | {z } | {z } | {z } parser i =1 predictor tagger ( w = ) � The predictor generates the next word w i with probability P v j h ; h i � 2 � 1 � The tagger attaches tag g i to the most recently generated word w i with probability ( g j w ) P ; h :tag ; h :tag i i � 1 � 2 � The parser builds the partial parse i from i , and i in a series of moves T T ; w g i � 1 ending with null , where a parser move a is made with probability ( a j h ) ; P ; h � 2 � 1 2 f (adjoin-left, NTtag) , (adjoin-right, NTtag) , null g a Microsoft Research Speech.Net

Model Parameter Reestimation Need to re-estimate model component probabilities such that we decrease the model perplexity. ( w = j h ) ; ( g j w ) ( a j h ) P v ; h P ; h :tag ; h :tag ; P ; h i � 2 � 1 i i � 1 � 2 � 2 � 1 N-best variant of the Expectation-Maximization(EM) algorithm: � We seed re-estimation process with parameter estimates gathered from manually or automatically parsed sentences 1 N � We retain the N “best” parses f T g for the complete sentence ; : : : ; T W � The hidden events in the EM algorithm are restricted to those occurring in the N “best” parses Microsoft Research Speech.Net

SLM for Information Extraction ☞ Training: initialization Initialize SLM as a syntactic parser from treebank syntactic parsing Train SLM as a matched constrained parser and parse the training data: boundaries of semantic constituents are matched augmentation Enrich the non/pre-terminal labels in the resulting treebank with semantic tags syntactic+semantic parsing Train SLM as an L-matched constrained parser: boundaries and tags of the semantic constituents are matched ☞ Test: – syntactic+semantic parsing of test sentences; retrieve the semantic parse by taking the semantic projection of the most likely parse: = (a rg max P ( T ; W )) S S E M i T i Microsoft Research Speech.Net

Constrained Parsing Using the SLM � a semantic parse S is equivalent to a set of constraints � each constraint = < is a 3-tuple l ; r ; Q > : l / r is the left/right boundary of the semantic constituent to be matched and Q is the set of allowable non-terminal tags for the constituent ☞ Match parsing ( syntactic parsing stage): 1. parses match the constraint boundaries 8 for a given sentence :l ; :r ; ☞ L-Match parsing ( syntactic+semantic parsing stage): 1. parses match the constraint boundaries and the set of labels Q : :l ; :r ; :Q; 8 2. the semantic projection of the parse trees must have exactly two levels ✔ Both Match and L-Match parsing can be efficiently implemented in the left-to-right, bottom-up, binary parsing strategy of the SLM ✔ On test sentences the only constraint available is the identity of the semantic tag at the root node Microsoft Research Speech.Net

Experiments MiPad data (personal information management) � training set: 2,239 sentences (27,119 words) and 5,431 slots � test set: 1,101 sentences (8,652 words) and 1,698 slots � vocabulary: 1,035wds, closed over test data Training Iteration Error Rate (%) Training Test Stage 2 Stage 4 Slot Frame Slot Frame Baseline 43.41 7.20 57.36 14.90 0, MiPad/NLPwin 0 9.78 1.65 37.87 21.62 1, UPenn Trbnk 0 8.44 2.10 36.93 16.08 1, UPenn Trbnk 1 7.82 1.70 36.98 16.80 1, UPenn Trbnk 2 7.69 1.50 36.98 16.80 � baseline is a semantic grammar developed manually that makes no use of syntactic information � initialize the syntactic SLM from in-domain MiPad treebank (NLPwin) and out-of- domain Wall Street Journal treebank (UPenn) � 3 iterations of N-best EM parameter reestimation algorithm Microsoft Research Speech.Net

Would More Data Help? � big difference in performance between training and test suggests over training � studied the performance of the model with decreasing amounts of training data Training Training Iteration Error Rate (%) Corpus Training Test Size Stage 2 Stage 4 Slot Frame Slot Frame Baseline 43.41 7.20 57.36 14.90 all 1, UPenn Trbnk 0 8.44 2.10 36.93 16.08 1/2 all 1, UPenn Trbnk 0 — — 43.76 18.44 1/4 all 1, UPenn Trbnk 0 — — 49.47 22.98 ✔ performance degradation w/ training data size is severe ✔ more training data and model parameterization that makes more effective use of the training data is likely to help Microsoft Research Speech.Net

Error Analysis � investigated the correlation between the semantic frame/slot accuracy and the num- ber of semantic slots in a sentence Error Rate (%) No. slots/sent Slot Frame No. Sent 1 43.97 18.01 755 2 39.23 16.27 209 3 26.44 5.17 58 4 26.50 4.00 50 5+ 21.19 6.90 29 ✔ Sentences containing more semantic slots are less ambiguous from an information extraction point of view Microsoft Research Speech.Net

Conclusions ✔ Presented a data driven approach to information extraction that outperforms a manually written semantic grammar ✔ Coupling of syntactic and semantic information improves information extraction accuracy, as shown previously by Miller et al., NAACL 2000 Future Work ✘ Use a statistical modeling technique that makes better use of limited amounts of training data and rich conditioning information — maximum entropy ✘ Aim at information extraction from speech: treat the word sequence as a hidden variable, thus finding the most likely semantic parse given a speech utterance Microsoft Research Speech.Net

Information Extraction Using the Structured Language Model Ciprian - PowerPoint PPT Presentation

Information Extraction Using the Structured Language Model Ciprian Chelba, Milind Mahajan Information Extraction from Text Structured Language Model (SLM) SLM for Information Extraction Experiments and Error Analysis Conclusions

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Pattern Markup-Language Pattern Markup-Language A tool for simplifying data extraction A tool

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

Convex relaxations for weakly supervised information extraction Edouard Grave Columbia

Structured Electronic Design Structured Electronic Design ET 8016 5 ECTS credits 1

L101: Introduction to Structured Prediction Ryan Cotterell What is structured prediction?

Semi-structured data Data is not just text, but is not as well- Semi-structured data

Introduction to SparkSQL Structured Data Processing in Spark 1 Structured Data Processing A

Variational Inference for Tutorial Outline Structured NLP Models 1. Structured Models and Factor

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music

Practical Extraction and Report Language Perl is a language of getting your job done

1 Nathan C. Habana, 1 John W. Jenson, 2 Stephen B. Gingerich 1 Water & Environmental Research

Density Ratio Estimation Density Ratio Estimation in Machine Learning in Machine Learning

Why Meaco? Trading since 1991 Dehumidifiers are our core business

Beam Extraction and Transport Taneli Kalvas Department of Physics, University of Jyvskyl,

4/14/2016 Thrombus Fragmentation and Extraction: Clinical Evidence and Practical Application

Multi-Source Information Extraction Valentin Tablan University of Sheffield University of

Chinese Informal Word Normalization: an Experimental Study Aobo Wang 1 , Min-Yen Kan 1,2 1 Web IR /

Neural Distant Superv rvision for Relation Ext xtraction Deepanshu Jindal Elements and Images

Information Extraction Using the Structured Language Model Ciprian - PowerPoint PPT Presentation

Information Extraction Using the Structured Language Model Ciprian Chelba, Milind Mahajan Information Extraction from Text Structured Language Model (SLM) SLM for Information Extraction Experiments and Error Analysis Conclusions

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Pattern Markup-Language Pattern Markup-Language A tool for simplifying data extraction A tool

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

Convex relaxations for weakly supervised information extraction Edouard Grave Columbia

Structured Electronic Design Structured Electronic Design ET 8016 5 ECTS credits 1

L101: Introduction to Structured Prediction Ryan Cotterell What is structured prediction?

Semi-structured data Data is not just text, but is not as well- Semi-structured data

Introduction to SparkSQL Structured Data Processing in Spark 1 Structured Data Processing A

Variational Inference for Tutorial Outline Structured NLP Models 1. Structured Models and Factor

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music

Practical Extraction and Report Language Perl is a language of getting your job done

1 Nathan C. Habana, 1 John W. Jenson, 2 Stephen B. Gingerich 1 Water &amp; Environmental Research

Density Ratio Estimation Density Ratio Estimation in Machine Learning in Machine Learning

Why Meaco? Trading since 1991 Dehumidifiers are our core business

Beam Extraction and Transport Taneli Kalvas Department of Physics, University of Jyvskyl,

4/14/2016 Thrombus Fragmentation and Extraction: Clinical Evidence and Practical Application

Multi-Source Information Extraction Valentin Tablan University of Sheffield University of

Chinese Informal Word Normalization: an Experimental Study Aobo Wang 1 , Min-Yen Kan 1,2 1 Web IR /

Neural Distant Superv rvision for Relation Ext xtraction Deepanshu Jindal Elements and Images

1 Nathan C. Habana, 1 John W. Jenson, 2 Stephen B. Gingerich 1 Water & Environmental Research