Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve - PowerPoint PPT Presentation

University of Sheffield NLP Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell

University of Sheffield NLP Recap • Previous two days looked at knowledge engineered IE • This session looks at machine learned IE • Supervised learning • Effort is shifted from language engineers to annotators

University of Sheffield NLP Outline • Machine Learning and IE • Support Vector Machines • GATE's learning API and PR • Learning entities – hands on • Learning relations – demo • (classifying sentences and documents)

University of Sheffield NLP Machine learning for information extraction

University of Sheffield NLP Machine Learning  We have data items comprising labels and features  E.g. an instance of “cat” has features “whiskers=1”, “fur=1”. A “stone” has “whiskers=0” and “fur=0”  Machine learning algorithm learns a relationship between the features and the labels  E.g. “if whiskers=1 then cat”  This is used to label new data  We have a new instance with features “whiskers=1” and “fur=1”--is it a cat or not???

University of Sheffield NLP Types of ML  Classification  Training instances pre-labelled with classes  ML algorithm learns to classify unseen data according to attributes  Clustering  Unlabelled training data  Clusters are determined automatically from the data  Derive representation using ML algorithm  Automate decision-making in the future

University of Sheffield NLP ML in Information Extraction  We have annotations (classes)  We have features (words, context, word features etc.)  Can we learn how features match classes using ML?  Once obtained, the ML representation can do our annotation for us based on features in the text  Pre-annotation  Automated systems  Possibly good alternative to knowledge engineering approaches  No need to write the rules  However, need to prepare training data

University of Sheffield NLP ML in Information Extraction  Central to ML work is evaluation  Need to try different methods, different parameters, to obtain good result  Precision: How many of the annotations we identified are correct?  Recall: How many of the annotations we should have identified did we?  F-Score: F = 2(precision.recall)/(precision+recall)  Testing requires an unseen test set  Hold out a test set Simple approach but data may be scarce   Cross-validation split training data into e.g. 10 sections  Take turns to use each “fold” as a test set  Average score across the 10 

University of Sheffield NLP ML Algorithms  Vector space models  Data have attributes (word features, context etc.)  Each attribute is a dimension  Data positioned in space  Methods involve splitting the space  Having learned the split, apply to new data  Support vector machines, K-Nearest Neighbours etc.  Finite state models, decision trees, Bayesian classification and more …  We will focus on support vector machines today

University of Sheffield NLP Support vector machines

University of Sheffield NLP Support Vector Machines • Attempt to find a hyperplane that separates data • Goal: maximize margin separating two classes • Wider margin = greater generalisation

University of Sheffield NLP Support Vector Machines • Points near decision boundary: support vectors (removing them would change boundary) • Points far from boundary not important for decision • What if data doesn't split? Soft boundary methods exist for – imperfect solutions However linear separator may be – completely unsuitable

University of Sheffield NLP Support Vector Machines • What if there is no separating hyperplane? • See example: • Or class may be a globule They do not work!

University of Sheffield NLP Kernel Trick • Map data into different dimensionality • Now the points are separable! • E.g. features alone may not make class linearly separable but combining features may • Generate many new features and let algorithm decide which to use

University of Sheffield NLP Support Vector Machines  SVMs combined with kernel trick provide a powerful technique  Multiclass methods simple extention to two class technique (one vs. another, one vs. others)  Widely used with great success across a range of linguistic tasks

University of Sheffield NLP GATE's learning API and PR

University of Sheffield NLP API and PRs • User Guide 9.24  Machine Learning PR • Chapter 11  Machine Learning API • Support for 3 types of learning • Produce features from annotations • Abstracts away from ML algorithms  Batch Learning PR • A GATE language analyser

University of Sheffield NLP Instances, attributes, classes California Governor Arnold Schwarzenegger proposes deep cuts. Instances: Any annotation Tokens are often convenient Token Token Token Token Token Tok Tok Attributes: Any annotation feature relative to instances Token.String Token.category (POS) Sentence.length Sentence Class: The thing we want to learn A feature on an annotation Entity.type Entity.type=Person =Location

University of Sheffield NLP Surround mode California Governor Arnold Schwarzenegger proposes deep cuts. Token Token Entity.type=Person • This learned class covers more than one instance.... • Begin / End boundary learning • Dealt with by API - surround mode • Transparent to the user

University of Sheffield NLP Multi class to binary California Governor Arnold Schwarzenegger proposes deep cuts. Entity.type Entity.type=Person =Location • Three classes, including null • Many algorithms are binary classifiers • One against all (One against others)  LOC vs PERS+NULL / PERS vs LOC+NULL / NULL vs LOC+PERS • One against one (One against another one)  LOC vs PERS / LOC vs NULL / PERS vs NULL • Dealt with by API - multClassification2Binary • Transparent to the user

University of Sheffield NLP ML applications in GATE • Batch Learning PR  Evaluation  Training  Application • Runs after all other PRs – must be last PR • Configured via xml file • A single directory holds generated features, models, and config file

University of Sheffield NLP The configuration file <?xml version="1.0"?> <ML-CONFIG> <VERBOSITY level="1"/> <SURROUND value="true"/> <FILTERING ratio="0.0" dis="near"/> • Verbosity: 0,1,2 • Surround mode: set true for entities, false for relations • Filtering: e.g. remove instances distant from the hyperplane

University of Sheffield NLP Thresholds <PARAMETER name="thresholdProbabilityEntity" value="0.3"/> <PARAMETER name="thresholdProbabilityBoundary" value="0.5"/> <PARAMETER name="thresholdProbabilityClassification" value="0.5"/> • Control selection of boundaries and classes in post processing • The defaults we give will work • Experiment • See the documentation

University of Sheffield NLP Multiclass and evaluation <multiClassification2Binary method="one-vs-others"/> <EVALUATION method="kfold" runs="10" /> • Multi-class  one-vs-others  One-vs-another • Evaluation  Kfold – runs gives number of folds  holdout – ratio gives training/test

University of Sheffield NLP The learning Engine <ENGINE nickname="SVM" implementationName="SVMLibSvmJava" options=" -c 0.7 -t 1 -d 3 -m 100 -tau 0.6"/> <ENGINE nickname="NB" implementationName="NaiveBayesWeka"/> <ENGINE nickname="C45" implementationName="C4.5Weka"/> • Learning algorithm and implementation specific • SVM: Java implementation of LibSVM Uneven margins set with -tau –

University of Sheffield NLP The dataset <DATASET> • Defines  Instance annotation  Class  Annotation feature to instance attribute mapping </DATASET>

University of Sheffield NLP Learning entities Hands on

University of Sheffield NLP The Problem • Information extraction consists on the identification of pre- specified facts in running texts • One important component of any information extraction system is a named entity identification component • Two main approaches exist for the identification of entities in text: • Hand-crafted rules: you’ve seen the ANNIE system • Machine learning approaches: we will explore one possibility in this session using a classification system • Manually developed rules use different source of information: identity of tokens, parts of speech, orthography of the tokens, dictionary information (e.g. Lookup process), etc. • ML components also rely on those sources of information and features have to be carefully selected by the ML developer

University of Sheffield NLP The Problem

University of Sheffield NLP Features for learning

Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve - PowerPoint PPT Presentation

University of Sheffield NLP Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell University of Sheffield NLP Recap Previous two days looked at knowledge engineered IE This session looks at machine learned IE

Advanced GATE Embedded Track II, Module 8 Second GATE Training Course May 2010 Advanced GATE

Lesson 6 Combinational Logic Circuits Gate Review AND Gate OR Gate NOT Gate NAND

Gate B Gate B Gate B Gate D Gate D Gate D Gate E Gate E Gate E Ferry Plaza Ferry Plaza

CHAPTER IV GATE DESIGN R.M. Dansereau; v.1.0 GATE NETWORKS INTRO. TO COMP. ENG. GATE

The GATE Embedded API Track II, Module 5 Second GATE Training Course May 2010 The GATE Embedded

GATE APIs Track II, Module 6 Second GATE Training Course May 2010 GATE APIs 1 / 62 Using Java

CSS GATE TESTING AND IDENTIFICATION 2017-2018 GATE PROGRAM DESCRIPTION GATE Mission

Xpanda security products The gate way to peace of mind Retail security gate solutions

Advanced GATE Embedded Track II, Module 8 Sixth GATE Training Course June 2013 2013 The

FOR SINGLE POLE SLALOM & SINGLE GATE GIANT SLALOM* THE CHIEF GATE JUDGE

Advanced GATE Embedded Track II, Module 8 Fifth GATE Training Course June 2012 2012 The

Advanced GATE Embedded Track II, Module 8 Third GATE Training Course AugustSeptember 2010

Advanced GATE Embedded Additional material: UIMA/GATE integration Fifth GATE Training Course

CVUSD GIFTED & TALENTED PROGRAM DAC PRESENTATION May 12, 2015 GATE Program GATE

Jericho Gate | 2014 Presentation JERICHO GATE THE PROJECT Jericho Gate | 2014 Presentation 2

CSD Entry Gate Improvement Project Town Hall November 14, 2018 Origin of Entry Gate Origin of

Characterizing Deep-Learning I/O Workloads in TensorFlow Steven W. D. Chien, Stefano Markidis,

Optimization for Training Deep Models presented by Kan Ren Table of Contents Optimization

Sub-Sampled Newton Methods for Machine Learning Jorge Nocedal Northwestern University Goldman

FourierSAT: A Fourier Expansion-Based Algebraic Framework for Solving Hybrid Boolean Constraints

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Nave Bayes, Perceptron CMSC 470 Marine Carpuat Slides credit: Jacob Eisenstein Linear Models

Bag-of-components: an online algorithm for batch learning of mixture models Olivier Schwander

AMMI Introduction to Deep Learning 6.4. Batch normalization Fran cois Fleuret

Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve - PowerPoint PPT Presentation

University of Sheffield NLP Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell University of Sheffield NLP Recap Previous two days looked at knowledge engineered IE This session looks at machine learned IE

Advanced GATE Embedded Track II, Module 8 Second GATE Training Course May 2010 Advanced GATE

Lesson 6 Combinational Logic Circuits Gate Review AND Gate OR Gate NOT Gate NAND

Gate B Gate B Gate B Gate D Gate D Gate D Gate E Gate E Gate E Ferry Plaza Ferry Plaza

CHAPTER IV GATE DESIGN R.M. Dansereau; v.1.0 GATE NETWORKS INTRO. TO COMP. ENG. GATE

The GATE Embedded API Track II, Module 5 Second GATE Training Course May 2010 The GATE Embedded

GATE APIs Track II, Module 6 Second GATE Training Course May 2010 GATE APIs 1 / 62 Using Java

CSS GATE TESTING AND IDENTIFICATION 2017-2018 GATE PROGRAM DESCRIPTION GATE Mission

Xpanda security products The gate way to peace of mind Retail security gate solutions

Advanced GATE Embedded Track II, Module 8 Sixth GATE Training Course June 2013 2013 The

FOR SINGLE POLE SLALOM &amp; SINGLE GATE GIANT SLALOM* THE CHIEF GATE JUDGE

Advanced GATE Embedded Track II, Module 8 Fifth GATE Training Course June 2012 2012 The

Advanced GATE Embedded Track II, Module 8 Third GATE Training Course AugustSeptember 2010

Advanced GATE Embedded Additional material: UIMA/GATE integration Fifth GATE Training Course

CVUSD GIFTED &amp; TALENTED PROGRAM DAC PRESENTATION May 12, 2015 GATE Program GATE

Jericho Gate | 2014 Presentation JERICHO GATE THE PROJECT Jericho Gate | 2014 Presentation 2

CSD Entry Gate Improvement Project Town Hall November 14, 2018 Origin of Entry Gate Origin of

Characterizing Deep-Learning I/O Workloads in TensorFlow Steven W. D. Chien, Stefano Markidis,

Optimization for Training Deep Models presented by Kan Ren Table of Contents Optimization

Sub-Sampled Newton Methods for Machine Learning Jorge Nocedal Northwestern University Goldman

FourierSAT: A Fourier Expansion-Based Algebraic Framework for Solving Hybrid Boolean Constraints

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Nave Bayes, Perceptron CMSC 470 Marine Carpuat Slides credit: Jacob Eisenstein Linear Models

Bag-of-components: an online algorithm for batch learning of mixture models Olivier Schwander

AMMI Introduction to Deep Learning 6.4. Batch normalization Fran cois Fleuret

FOR SINGLE POLE SLALOM & SINGLE GATE GIANT SLALOM* THE CHIEF GATE JUDGE

CVUSD GIFTED & TALENTED PROGRAM DAC PRESENTATION May 12, 2015 GATE Program GATE