Introduction to CRFs Isabelle Tellier 02-08-2013 Plan 1. What is - PowerPoint PPT Presentation

Introduction to CRFs Isabelle Tellier 02-08-2013

Plan 1. What is annotation for ? 2. Linear and tree-shaped CRFs 3. State of the Art 4. Conclusion

1. What is annotation for ? What is annotation ? – inputs can be either texts ou trees or any structure built on finite vocabulary items – annotate such a structure = associate to each of its items an output label belonging to another finite vocabulary – the structure is given and preserved

1. What is annotation for ? Exemples of text annotations – POS (“part of speech”) labeling : item = “word” annotation = morphosyntactic label (Det, N, etc.) in the text – named entities (NE), IE : item = “word” annotation = type (D for Date, E for Event, P for Place...) + position of the NE (B for Begin, I for In, O for Out) In 2016 the Olympic Games will take place in Rio de Janeiro O DB O EB EI O O O O PB PI PI – segmentation of a text into “chunks”, phrases, clauses... – segmentation of a document into sections (ex : distinguish Title, Menus, Adverts, etc. in a Web page)

1. What is annotation for ? Exemples of text annotations – Text alignment for automatic translation J’ aime le chocolat I X like X chocolate X – correspondance matrices are projected into couples of annotations J ′ aime 2 le 3 chocolat 4 I 1 like 2 chocolate 3 1 1 2 - 3 1 2 4

1. What is annotation for ? Exemples of tree annotations SENT NP VN VP . SUJ PRED OBJ VN NP PP Sligos va PRED OBJ MOD prendre pied au NP Royaume-Uni – syntactic functions, SLR (Semantic Role Labeling : agent, patient...) of a syntactic tree – label = value of an attribute in an XML node

1. What is annotation for ? Exemples of tree annotations HTML Channel BODY DelN . . . DIV . . . DelST item DelST TABLE #text DelN DelN TR DelN TD TD DelST DelN #text DIV A SPAN DIV DelST title DelN description DelST #text #text @href #text . . . 0 DelN link 0 DelST – on the left : an HTML tree – on the right : a labeling with editing operations – DelN, DelST : Delete a Node/SubTree – channel, item, title, link, description : rename a node

1. What is annotation for ? Exemples of tree annotations – execution of the editing operations HTML Channel BODY item . . . DIV . . . title link description TABLE #text TR TD TD #text DIV A SPAN DIV #text #text @href #text . . . – implemented application : generations of RSS feeds from HTML pages – other possible application : extraction of portions de Web pages

1. What is annotation for ? Summary – many tasks can be considered as annotation tasks – for this, you need to specify – the nature of input items – the relationships between items : order relations of the input structure (sequence, tree...) – the nature of the annotations and their meaning – the relationships between annotations – the relationships between the items their corresponding annotation – pre-treatments and post-treatments often necessary

Plan 1. What is annotation for ? 2. Linear and Tree-shaped CRFs 3. State of the Art 4. Conclusion

2. Linear and Tree-shaped CRFs Basic notions – classical notations : x is the input, y its annotation (of the same structure) – x and y are decomposed into random variables : x = { X 1 , X 2 , ..., X n } et y = { Y 1 , Y 2 , ...Y n } – a graphical model defines dependances between the random variables in a graph – in a generative model (HMM, PCFG), there are oriented Y i X j dependence from Y i to X j – otherwise, in a discriminative model (CRF), it is possible to compute directly p ( y | x ) without knowing p ( x ) – learning : find the best possible parameters for p ( y | x ) from annotated examples ( x, y ) by maximazing the likelihood – annotation : for a new x , compute ˆ y = argmax y p ( y | x )

2. Linear and Tree-shaped CRFs Basic properties of CRFs – define a non oriented graph on the variables Y i (implicitely : every variable X is connected) – CRFs are markovien discriminative models : p ( Y i | X ) only dépends of X and Y j ( i � = j ) such that Y i and Y j are connected – CRFs are defined by (Lafferty, McCallum et Pereira 01) 1 � � � � p ( y | x ) = exp λ k f k ( y c , x, i ) Z ( x ) c ∈C k – C is the set of cliques of the graph – y c : values of y on the clique c – Z ( x ) un normalization factor – the f k are user-provided features – λ k are the parameters of the model (weights for f k )

2. Linear and Tree-shaped CRFs The usual graph for linear CRFs ... ... Y i − 1 Y i +1 Y 1 Y i Y N – the features can use any information in x combined with any information in y c – examples of features f k ( y i − 1 , y i , x, i ) at position i : * f k ( y i − 1 , y i , x, i ) = 1 if x i − 1 ∈ { the, a } and y i − 1 = Det et y i = N = 0 otherwise * f k ′ ( y i − 1 , y i , x, i ) = 1 if { Mr, Mrs, Miss } ∩ { x i − 3 , ..., x i − 1 } � = ∅ and y i = NE = 0 otherwise

2. Linear and Tree-shaped CRFs Generate Features from the Labeled examples x y La Det bonne Adj soupe N fume V . 0 ponct ... Definition of features in softwares – define a pattern (any shape on x , at most clique-width on y ) – corresponding instance : f 1 ( y i − 1 , y i , x, i ) = 1 if ( x i =La) AND ( y i =Det) = 0 otherwise

2. Linear and Tree-shaped CRFs Generate Features from the Labeled examples x y La Det bonne Adj soupe N fume V . 0 ponct ... Associated feature f 2 ( y i − 1 , y i , x, i ) = 1 if ( x i =bonne) AND ( y i =Adj) = 0 otherwise

2. Linear and Tree-shaped CRFs Generate Features from the Labeled examples x y La Det bonne Adj soupe N fume V . 0 ponct ... Associated feature f 4 ( y i − 1 , y i , x, i ) = 1 if ( x i − 1 =La) AND ( y i − 1 =Det) AND ( x i =bonne) AND ( y i =Adj) = 0 otherwise

2. Linear and Tree-shaped CRFs Transform a HMM into a linear CRF 1/3 Adj bonne : 1 / 2 , grande : 1 / 2 1/3 2/3 2/3 1 Det N V intr la : 2 / 3 bonne : 1 / 3 fume : 4 / 5 une : 1 / 3 soupe : 2 / 3 soupe : 1 / 5 – f 1 ( y i , x, 1) = 1 if y i = Det and x i = la ( = 0 otherwise), λ 1 = log (2 / 3) – f 2 ( y i − 1 , y i , x, 1) = 1 if y i − 1 = Det and y i = Adj ( = 0 otherwise), λ 2 = log (1 / 3) (if empty transition λ = −∞ ) – the computation of p ( y | x ) is the same in both cases

2. Linear and Tree-shaped CRFs Possible graphs for trees ⊥ ⊥ SUJ PRED OBJ ⊥ SUJ PRED OBJ ⊥ ⊥ ⊥ PRED OBJ MOD ⊥ ⊥ PRED OBJ MOD ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥

2. Linear and Tree-shaped CRFs Implementations – learning step by maximizing the log-likelihood � � log ( p ( y | x )) = log p ( y | x ) + penalty... ( x,y ) ∈ S ( x,y ) ∈ S by gradient descent (L-BFGS) – annotation by Viterby (linear), inside-outside (trees), message passing (general)... – computation in K ∗ N ∗ | Y | c ( c length of the largest clique) – implementations available : Mallet, GRMM, CRFSuite, CRF++, Wapiti, XCRF (for 3-width clique trees), Factorie

3. State of the Art Use of CRFs for labeling tasks – NE recognition (McCallum & Li, 2003) – IE from tables (Pinto & al., 2003), – POS labeling (Altun & al., 2003) – shallow parsing (Sha & Pereira, 2003) – SRL for trees (Cohn & Blusom 2005) – tree transformation (Gilleron & al. 2006) – non linguistic uses : image labeling/segmenting, RNA alignment...

3. State of the Art Extensions about the graph – add dependencies in the graph : skip-chain CRFs, dynamic (multi-levels) CRFs... – use CRFs for syntactic parsing (Finkel & al. 2008) – build the tree structure of a CRF (Bradley & Guestrin 2010) – CRFs for general graphs (grid-shaped for images) How to build the features – nearly always binary – feature induction (Mc Callum 2003) – allow to integrate external knowledge... (cf. further) – more general features may be more effective (Pu & al. 2010)

3. State of the Art About the learning step – unsupervised or semi-supervised CRFs (difficult, not very effective) – add L1 penalty to the likelihood to select the best features (Lavergne & Yvon 2010) – add constraints at different possible levels (features, likelihood, labels...) : LREC 2012 tutorial (Druck & alii 2012) – MCMC inference methods

3. State of the Art Linguistic interest – sequential vs. direct complex labeling ? – how to integrate linguistic knowledge ? – as external constraints – as additional labeled input data – as features

Conclusion Interests – very effective for many tasks – allow the integration of many distinct sources of information – many available easy-to-use libraries Weaknesses – does not support well unsupervised/semi-supervised learning – not very incremental – still high learning complexity with large cliques or large label vocabulary

Introduction to CRFs Isabelle Tellier 02-08-2013 Plan 1. What is - PowerPoint PPT Presentation

Introduction to CRFs Isabelle Tellier 02-08-2013 Plan 1. What is annotation for ? 2. Linear and tree-shaped CRFs 3. State of the Art 4. Conclusion 1. What is annotation for ? What is annotation ? inputs can be either texts ou trees or

Food Solutions New England Tom Kelly PhD Executive Director UNH Sustainability Institute

CRFS LIVESTOCK WORKGROUP GOAL The CRFS Livestock Work Group will conduct and coordinate research,

Isabelle/jEdit for seasoned Isabelle users Isabelle/jEdit NEWS Makarius Wenzel Univ. Paris-Sud,

Higher-order CRFs Nikos Komodakis (University of Crete) Introduction Conditional Random Fields

Lecture 3: New Trade Theory Isabelle M ejean isabelle.mejean@polytechnique.edu

7/8/2013 1 7/8/2013 2 7/8/2013 3 7/8/2013 4 7/8/2013 5 7/8/2013 6 7/8/2013 7 7/8/2013

Structured Induction Proofs in Isabelle/Isar Makarius April 2006 1. Motivation 2. The

Parsing pregroup grammars using partial composition echet (1) , Annie Foret (2) and Isabelle

in Austronesian languages Isabelle BRIL Lacito-CNRS, LABEX EFL isabelle.bril@cnrs.fr 1

ARC/ORAU 3D Printing ARC POD: The Personal Protection Device Realena, Zachary, Isabelle ARC

Using the Isabelle Ontology Framework Using the Isabelle Ontology Framework Linking the Formal

Proof Strategy Language and Goal-Oriented Conjecturing for Isabelle/HOL Yutaka Nagashima and

Induction in Isabelle: Lecture 14 1 Notation In lecture 14 we introduced some recursive

Concrete Semantics with Isabelle/HOL Tobias Nipkow Fakult at f ur Informatik Technische

Revised: March 4, 2013 3/19/2013 3/19/2013 2 3/19/2013 3 3/19/2013 4 3/19/2013 5

WL Class of 2024 Course Request Forms (CRFs) Teacher recommendations Performance not

Software Architecture School of Computer Science, University of Oviedo Lab. 11 Load testing

Good Neighbor Authority Legislative Presentation Senate Resources and Environment Committee

A General-Purpose Machine Learning Method for Tokenization and Sentence Boundary Detection

slides: Introduction to CGE modelling: Facts and an Instance Presentation May 2020 DOI:

Using Accessor Variety Features of Source Graphemes in Machine Transliteration of English to

Dave Mark Intrinsic Algorithm Reducing the world to mathematical equations! Reducing

Introducing... Benjamin Mako Hill GULEV: Ubuntu Canonical Ltd. Ubuntu A GNU/Linux Operating

of food craving Carrie R. Ferrario, PhD Oct 13, 2020 BBRF Meet the Scientist Webinar Series