introduction to crfs isabelle tellier 02 08 2013 plan
play

Introduction to CRFs Isabelle Tellier 02-08-2013 Plan 1. What is - PowerPoint PPT Presentation

Introduction to CRFs Isabelle Tellier 02-08-2013 Plan 1. What is annotation for ? 2. Linear and tree-shaped CRFs 3. State of the Art 4. Conclusion 1. What is annotation for ? What is annotation ? inputs can be either texts ou trees or


  1. Introduction to CRFs Isabelle Tellier 02-08-2013

  2. Plan 1. What is annotation for ? 2. Linear and tree-shaped CRFs 3. State of the Art 4. Conclusion

  3. 1. What is annotation for ? What is annotation ? – inputs can be either texts ou trees or any structure built on finite vocabulary items – annotate such a structure = associate to each of its items an output label belonging to another finite vocabulary – the structure is given and preserved

  4. 1. What is annotation for ? Exemples of text annotations – POS (“part of speech”) labeling : item = “word” annotation = morphosyntactic label (Det, N, etc.) in the text – named entities (NE), IE : item = “word” annotation = type (D for Date, E for Event, P for Place...) + position of the NE (B for Begin, I for In, O for Out) In 2016 the Olympic Games will take place in Rio de Janeiro O DB O EB EI O O O O PB PI PI – segmentation of a text into “chunks”, phrases, clauses... – segmentation of a document into sections (ex : distinguish Title, Menus, Adverts, etc. in a Web page)

  5. 1. What is annotation for ? Exemples of text annotations – Text alignment for automatic translation J’ aime le chocolat I X like X chocolate X – correspondance matrices are projected into couples of annotations J ′ aime 2 le 3 chocolat 4 I 1 like 2 chocolate 3 1 1 2 - 3 1 2 4

  6. 1. What is annotation for ? Exemples of tree annotations SENT NP VN VP . SUJ PRED OBJ VN NP PP Sligos va PRED OBJ MOD prendre pied au NP Royaume-Uni – syntactic functions, SLR (Semantic Role Labeling : agent, patient...) of a syntactic tree – label = value of an attribute in an XML node

  7. 1. What is annotation for ? Exemples of tree annotations HTML Channel BODY DelN . . . DIV . . . DelST item DelST TABLE #text DelN DelN TR DelN TD TD DelST DelN #text DIV A SPAN DIV DelST title DelN description DelST #text #text @href #text . . . 0 DelN link 0 DelST – on the left : an HTML tree – on the right : a labeling with editing operations – DelN, DelST : Delete a Node/SubTree – channel, item, title, link, description : rename a node

  8. 1. What is annotation for ? Exemples of tree annotations – execution of the editing operations HTML Channel BODY item . . . DIV . . . title link description TABLE #text TR TD TD #text DIV A SPAN DIV #text #text @href #text . . . – implemented application : generations of RSS feeds from HTML pages – other possible application : extraction of portions de Web pages

  9. 1. What is annotation for ? Summary – many tasks can be considered as annotation tasks – for this, you need to specify – the nature of input items – the relationships between items : order relations of the input structure (sequence, tree...) – the nature of the annotations and their meaning – the relationships between annotations – the relationships between the items their corresponding annotation – pre-treatments and post-treatments often necessary

  10. Plan 1. What is annotation for ? 2. Linear and Tree-shaped CRFs 3. State of the Art 4. Conclusion

  11. 2. Linear and Tree-shaped CRFs Basic notions – classical notations : x is the input, y its annotation (of the same structure) – x and y are decomposed into random variables : x = { X 1 , X 2 , ..., X n } et y = { Y 1 , Y 2 , ...Y n } – a graphical model defines dependances between the random variables in a graph – in a generative model (HMM, PCFG), there are oriented Y i X j dependence from Y i to X j – otherwise, in a discriminative model (CRF), it is possible to compute directly p ( y | x ) without knowing p ( x ) – learning : find the best possible parameters for p ( y | x ) from annotated examples ( x, y ) by maximazing the likelihood – annotation : for a new x , compute ˆ y = argmax y p ( y | x )

  12. 2. Linear and Tree-shaped CRFs Basic properties of CRFs – define a non oriented graph on the variables Y i (implicitely : every variable X is connected) – CRFs are markovien discriminative models : p ( Y i | X ) only dépends of X and Y j ( i � = j ) such that Y i and Y j are connected – CRFs are defined by (Lafferty, McCallum et Pereira 01) 1 � � � � p ( y | x ) = exp λ k f k ( y c , x, i ) Z ( x ) c ∈C k – C is the set of cliques of the graph – y c : values of y on the clique c – Z ( x ) un normalization factor – the f k are user-provided features – λ k are the parameters of the model (weights for f k )

  13. 2. Linear and Tree-shaped CRFs The usual graph for linear CRFs ... ... Y i − 1 Y i +1 Y 1 Y i Y N – the features can use any information in x combined with any information in y c – examples of features f k ( y i − 1 , y i , x, i ) at position i : * f k ( y i − 1 , y i , x, i ) = 1 if x i − 1 ∈ { the, a } and y i − 1 = Det et y i = N = 0 otherwise * f k ′ ( y i − 1 , y i , x, i ) = 1 if { Mr, Mrs, Miss } ∩ { x i − 3 , ..., x i − 1 } � = ∅ and y i = NE = 0 otherwise

  14. 2. Linear and Tree-shaped CRFs Generate Features from the Labeled examples x y La Det bonne Adj soupe N fume V . 0 ponct ... Definition of features in softwares – define a pattern (any shape on x , at most clique-width on y ) – corresponding instance : f 1 ( y i − 1 , y i , x, i ) = 1 if ( x i =La) AND ( y i =Det) = 0 otherwise

  15. 2. Linear and Tree-shaped CRFs Generate Features from the Labeled examples x y La Det bonne Adj soupe N fume V . 0 ponct ... Associated feature f 2 ( y i − 1 , y i , x, i ) = 1 if ( x i =bonne) AND ( y i =Adj) = 0 otherwise

  16. 2. Linear and Tree-shaped CRFs Generate Features from the Labeled examples x y La Det bonne Adj soupe N fume V . 0 ponct ... Associated feature f 4 ( y i − 1 , y i , x, i ) = 1 if ( x i − 1 =La) AND ( y i − 1 =Det) AND ( x i =bonne) AND ( y i =Adj) = 0 otherwise

  17. 2. Linear and Tree-shaped CRFs Transform a HMM into a linear CRF 1/3 Adj bonne : 1 / 2 , grande : 1 / 2 1/3 2/3 2/3 1 Det N V intr la : 2 / 3 bonne : 1 / 3 fume : 4 / 5 une : 1 / 3 soupe : 2 / 3 soupe : 1 / 5 – f 1 ( y i , x, 1) = 1 if y i = Det and x i = la ( = 0 otherwise), λ 1 = log (2 / 3) – f 2 ( y i − 1 , y i , x, 1) = 1 if y i − 1 = Det and y i = Adj ( = 0 otherwise), λ 2 = log (1 / 3) (if empty transition λ = −∞ ) – the computation of p ( y | x ) is the same in both cases

  18. 2. Linear and Tree-shaped CRFs Possible graphs for trees ⊥ ⊥ SUJ PRED OBJ ⊥ SUJ PRED OBJ ⊥ ⊥ ⊥ PRED OBJ MOD ⊥ ⊥ PRED OBJ MOD ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥

  19. 2. Linear and Tree-shaped CRFs Implementations – learning step by maximizing the log-likelihood � � log ( p ( y | x )) = log p ( y | x ) + penalty... ( x,y ) ∈ S ( x,y ) ∈ S by gradient descent (L-BFGS) – annotation by Viterby (linear), inside-outside (trees), message passing (general)... – computation in K ∗ N ∗ | Y | c ( c length of the largest clique) – implementations available : Mallet, GRMM, CRFSuite, CRF++, Wapiti, XCRF (for 3-width clique trees), Factorie

  20. Plan 1. What is annotation for ? 2. Linear and tree-shaped CRFs 3. State of the Art 4. Conclusion

  21. 3. State of the Art Use of CRFs for labeling tasks – NE recognition (McCallum & Li, 2003) – IE from tables (Pinto & al., 2003), – POS labeling (Altun & al., 2003) – shallow parsing (Sha & Pereira, 2003) – SRL for trees (Cohn & Blusom 2005) – tree transformation (Gilleron & al. 2006) – non linguistic uses : image labeling/segmenting, RNA alignment...

  22. 3. State of the Art Extensions about the graph – add dependencies in the graph : skip-chain CRFs, dynamic (multi-levels) CRFs... – use CRFs for syntactic parsing (Finkel & al. 2008) – build the tree structure of a CRF (Bradley & Guestrin 2010) – CRFs for general graphs (grid-shaped for images) How to build the features – nearly always binary – feature induction (Mc Callum 2003) – allow to integrate external knowledge... (cf. further) – more general features may be more effective (Pu & al. 2010)

  23. 3. State of the Art About the learning step – unsupervised or semi-supervised CRFs (difficult, not very effective) – add L1 penalty to the likelihood to select the best features (Lavergne & Yvon 2010) – add constraints at different possible levels (features, likelihood, labels...) : LREC 2012 tutorial (Druck & alii 2012) – MCMC inference methods

  24. 3. State of the Art Linguistic interest – sequential vs. direct complex labeling ? – how to integrate linguistic knowledge ? – as external constraints – as additional labeled input data – as features

  25. Plan 1. What is annotation for ? 2. Linear and tree-shaped CRFs 3. State of the Art 4. Conclusion

  26. Conclusion Interests – very effective for many tasks – allow the integration of many distinct sources of information – many available easy-to-use libraries Weaknesses – does not support well unsupervised/semi-supervised learning – not very incremental – still high learning complexity with large cliques or large label vocabulary

Recommend


More recommend