presentation on
play

PRESENTATION ON: A SHORTEST PATH DEPENDENCY KERNEL FOR RELATION - PowerPoint PPT Presentation

DEPENDENCY PARSING KERNEL METHODS PAPER PRESENTATION ON: A SHORTEST PATH DEPENDENCY KERNEL FOR RELATION EXTRACTION Hypothesis Dependency Parsing - CFG vs CCG Kernel Evaluation **Taken from CS388 by Raymond J. Mooney


  1. DEPENDENCY PARSING KERNEL METHODS PAPER PRESENTATION ON: “A SHORTEST PATH DEPENDENCY KERNEL FOR RELATION EXTRACTION” • Hypothesis • Dependency Parsing - CFG vs CCG • Kernel • Evaluation **Taken from CS388 by Raymond J. Mooney University of T exas at Austin

  2. DEPENDENCY PARSING KERNEL METHODS PAPER From: Wikipedia DEPENDENCY PARSING *Taken from CS388 by Raymond J. Mooney University of T exas at Austin

  3. DEPENDENCY PARSING KERNEL METHODS PAPER For sentence: John liked the dog in the pen. S liked-VBD liked S dobj nsubj NP VP liked-VBD John-NNP John dog VP PP NP VP in-IN liked-VBD det in John IN NP pen-NN VBD NP dog-NN John V NP the pen liked in DT Nominal pen-NN DT Nominal dog-NN liked the dog in the pen the NN det the NN the pen dog Parse Trees* Typed Dependency Parse Trees* • Represent structure in sentences using “lexical terms” linked by binary relations called “dependencies” Labelled Case v/s unlabeled • • Throw in a ROOT Can be generated from parse trees. Can be generated directly (Collins’ NLP course has more) •

  4. DEPENDENCY PARSING KERNEL METHODS PAPER Parse Trees using CFG – with heads and without. • Created using PCFGs – using the CKY algorithm. • Weights for hand-written rules are obtained through trained. Problem is that production used to expand a non-terminal is independent of context – solution? add heads • • Even with heads, it doesn’t really care about semantics. Example PCFG rules (no heads) with weights S liked-VBD liked NP VP liked-VBD dobj John-NNP nsubj VP PP liked-VBD in-IN John dog John IN NP VBD NP pen-NN det in dog-NN pen-NN liked in DT Nominal DT Nominal the pen dog-NN the NN the NN det pen dog the Can convert a phrase structure parse to a dependency tree by making the head of each non-head child of a node depend on the head of the head child.*

  5. DEPENDENCY PARSING KERNEL METHODS PAPER Combinatory Categorial Grammars Unclear how to go from • Phrase structure grammar, not dependency based phrasal parse to • Can be converted to dependency parse dependency parse Based on application on different types of combinators. • • Combinators act on 𝑏𝑠𝑕𝑣𝑛𝑓𝑜𝑢 and have a 𝑔𝑣𝑜𝑑𝑢𝑝𝑠 Gagan: Interesting Think 𝜇 -calculus • Rishab: Better CCG parser • Special syntactic category for relative pronouns + using heads = long range (?) Modern day parser? from Steedman, and wikipedia Bunescu and Mooney, 2005

  6. DEPENDENCY PARSING KERNEL METHODS PAPER KERNEL METHODS Adapted from: ACL’04 tutorial Jean-michel Renders Xerox research center europe (france)

  7. DEPENDENCY PARSING KERNEL METHODS PAPER f Polynomial kernel 2D to 3D RBF kernel ACL-04 Tutorial

  8. DEPENDENCY PARSING KERNEL METHODS PAPER KERNEL METHODS : INTUITIVE IDEA  Find a mapping f such that, in the new space, problem solving is easier (e.g. linear) … mapping is similar to features.  The kernel represents the similarity between two objects (documents, terms, …), defined as the dot -product in this new vector space  The mapping is left implicit – Avoid expensive transformation.  Easy generalization of a lot of dot-product (or distance) based algorithms ACL-04 Tutorial

  9. DEPENDENCY PARSING KERNEL METHODS PAPER KERNEL : FORMAL  A kernel 𝑙(𝑦, 𝑧)  is a similarity measure  defined by an implicit mapping f,  from the original space to a vector space (feature space)  such that: 𝑙(𝑦, 𝑧) = f (𝑦) • f (𝑧)  This similarity measure and the mapping include:  Simpler structure (linear representation of the data)  Possibly infinite dimension (hypothesis space for learning)  … but still computational efficiency when computing 𝑙(𝑦, 𝑧)  Valid Kernel: Any kernel that satisfies Mercer’s theorem ACL-04 Tutorial

  10. DEPENDENCY PARSING KERNEL METHODS PAPER KERNELS FOR TEXT  Seen as ‘bag of words’ : dot product or polynomial kernels (multi -words)  Seen as set of concepts : GVSM kernels, Kernel LSI (or Kernel PCA), Kernel ICA, …possibly multilingual  Seen as string of characters: string kernels  Seen as string of terms/concepts: word sequence kernels  Seen as trees (dependency or parsing trees): tree kernels  Seen as the realization of probability distribution (generative model) ACL-04 Tutorial

  11. DEPENDENCY PARSING KERNEL METHODS PAPER Tree Kernels Special case of general kernels defined on discrete structures (graphs). • Consider the following example: • 𝑙 𝜐 1 , 𝜐 2 = #𝑑𝑝𝑛𝑛𝑝𝑜 𝑡𝑣𝑐𝑢𝑠𝑓𝑓𝑡 𝑐𝑓𝑢𝑥𝑓𝑓𝑜 𝜐 1 , 𝜐 2 Feature space is space of all subtrees. ( huge ) • Kernel is computed in polynomial time using: • 𝑙 𝑈 1 , 𝑈 2 = 𝑙 𝑑𝑝−𝑠𝑝𝑝𝑢𝑓𝑒 (𝑜 1 , 𝑜 2 ) 𝑜 1 ∈𝑈 𝑜 2 ∈𝑈 1 2 0, 𝑜 1 𝑝𝑠 𝑜 2 𝑗𝑡 𝑏 𝑚𝑓𝑏𝑔 𝑝𝑠 𝑜 1 ≠ 𝑜 2 𝑙 𝑑𝑝−𝑠𝑝𝑝𝑢𝑓𝑒 = 1 + 𝑙 𝑑𝑝−𝑠𝑝𝑝𝑢𝑓𝑒 (𝑑ℎ 𝑜 1 , 𝑗 , 𝑑ℎ(𝑜 2 , 𝑗)) , 𝑝𝑥 𝑗∈𝑑ℎ𝑗𝑚𝑒𝑠𝑓𝑜 Shantanu: not intuitive #common subtrees = 7 Similar ideas used in Culotta and Sorensen, 2004 From ACL tutorial 2004 wikipedia

  12. DEPENDENCY PARSING KERNEL METHODS PAPER Interesting: Remembers training data unlike other methods (regression, GDA) • Nice theoretical properties • Dual space • Most popular kernel method is SVMs • Kernel trick can lift other linear methods into 𝜚 space – PCA, for • example Multiclass SVM: Given classes 0,1,2 … 𝑀 𝑀 which are functions on input space, and assign label that One-vs-all: learn 𝑕 𝑗 𝑗 Anshul: Multiclass SVM blows up gives maximum 𝑕 𝑗 - 𝑃(𝑀) classifiers for many classes. No finer 𝑀,𝑀 , one function for each pair of classes. Assign label One-vs-one: learn 𝑕 𝑗𝑘 𝑗,𝑘 relations. with most “votes”. - 𝑃(𝑀 2 ) classifiers How does hierarchy help? (think S1 vs S2)

  13. DEPENDENCY PARSING KERNEL METHODS PAPER Arindam: over simplified Nupur: didn’t verify hypothesis Barun: Useful. No statistical backing Swarnadeep: More examples/backing for hypothesis Dhruvin: When does it fail? Happy: intuition  A SHORTEST PATH DEPENDENCY KERNEL FOR RELATION EXTRACTION HYPOTHESIS: If 𝑓 1 and 𝑓 2 are entities in a sentence related by 𝑆 , then hypothesize that contribution of sentence dependency graph to establishing 𝑆(𝑓 1 , 𝑓 2 ) is almost exclusively concentrated in the shortest path between 𝑓 1 and 𝑓 2 in the undirected dependency graph. All figures and tables from Bunescu and Mooney, 2005

  14. DEPENDENCY PARSING KERNEL METHODS PAPER Coreference Resolution Information Extraction NER Relation Extraction • Paper extracts relations between Akshay: Limited ontology • Intra-sentential entities Surag: no temporality • Which have been tagged using a • A kernel method using • Kernel based on the shortest path between entities in the sentence .

  15. DEPENDENCY PARSING KERNEL METHODS PAPER Assumptions: PoS All relations are intra-sentence. Sentences are independent of each other. Relations are known, entities are known. Amount of syntactic knowledge Syntactic knowledge helps with IE. Different levels of syntactic knowledge have been used. Chunking Paper states the hypothesis that most of the information useful for Relation Extraction is concentrated in shortest path in undirected dependency graph between entities. How do we use syntactic knowledge Shallow Ray and Craven, 2001: PoS and Chunking Parse Trees Zelenko et al, 2003 : Shallow parse trees based kernel methods Culotta and Sorensen, 2004 : Dependency trees Anshul: Mines implicit relations???  No strong reasons  Himanshu: dependency is hard Dependency Arindam: likes deep syntactic knowledge. Trees Nupur: likes idea. Barun: is classification even useful? Gagan: dislike sentence assumption

  16. DEPENDENCY PARSING KERNEL METHODS PAPER Mathematical! – Before the Kernel Original Training Data: Articles, with entities and relations where 𝒚 𝑗 is shortest path with types. – Will go through SVM in a bit. 𝑂 Processed Training data: (𝒚 𝑗 , 𝑧 𝑗 ) 𝑗=0 𝑧 𝑗 - 5+1 top level relations; 24 fine relation types. 𝑧 𝑗 top level = { ROLE, PART, LOCATED, NEAR, SOCIAL} Handles negation – Attach (-) suffix to words modified by negative determiners. Nupur, Gagan: nots! Anshul: more nots? Yashoteja: general – add more features! Prachi: what happened to 𝑓 3 ? Happy: verb semantics? Use markov logic? Anshul: intrasentence context  ; no novel relations  ; 𝒚 𝑗 𝜚(𝒚 𝑗 ) ?

Recommend


More recommend