phd course in machine learning
play

PhD course in Machine Learning Kernel Engineering Alessandro - PowerPoint PPT Presentation

PhD course in Machine Learning Kernel Engineering Alessandro Moschitti Department of information and communication technology University of Trento Email: moschitti@dit.unitn.it Kernel Engineering approaches Basic Combinations Canonical


  1. PhD course in Machine Learning Kernel Engineering Alessandro Moschitti Department of information and communication technology University of Trento Email: moschitti@dit.unitn.it

  2. Kernel Engineering approaches Basic Combinations Canonical Mappings , e.g. object transformations Merging of Kernels

  3. Kernel Combinations an example 3 K polynomial kernel of flat features p K Tree kernel Tree Kernel Combinations: 3 3 K K K , K K K = γ × + = × Tree P Tree p Tree P Tree p + × 3 3 K K K K × p Tree p Tree K , K = γ × + = Tree P Tree P + × 3 3 K K K K × Tree p Tree p

  4. Object Transformation [Moschitti et al, CLJ 2008] K ( O , O ) ( O ) ( O ) ( ( O )) ( ( O )) = φ ⋅ φ = φ φ ⋅ φ φ 1 2 1 2 E M 1 E M 2 ( S ) ( S ) K ( S , S ) = φ ⋅ φ = E 1 E 2 E 1 2 Canonical Mapping , φ M () object transformation, e. g. a syntactic parse tree, into a verb subcategorization frame tree. Feature Extraction , φ E () maps the canonical structure in all its fragments different fragment spaces, e. g. ST, SST and PT.

  5. Predicate Argument Classification In an event: target words describe relation among different entities the participants are often seen as predicate's arguments. Example: Paul gives a talk in Rome

  6. Predicate Argument Classification In an event: target words describe relation among different entities the participants are often seen as predicate's arguments. Example: [ Arg0 Paul] [ predicate gives ] [ Arg1 a talk] [ ArgM in Rome]

  7. Predicate-Argument Feature Representation Given a sentence, a predicate p : 1. Derive the sentence parse tree 2. For each node pair <N p ,N x > a. Extract a feature representation set F b. If N x exactly covers the Arg- i, F is one of its positive examples c. F is a negative example otherwise

  8. Vector Representation for the linear kernel Phrase Type Predicate Word Head Word Parse Tree Position Right Path Voice Active

  9. Kernel Engineering: Tree Tailoring

  10. PAT Kernel [Moschitti, ACL 2004] Given the sentence: [ Arg0 Paul] [ predicate delivers] [ Arg1 a talk] [ ArgM in formal Style] a) b) c) F v,arg.0 S S S F v,arg.1 VP N VP VP N N Paul V NP Paul V Paul V PP NP NP PP PP F v,arg.M Arg. 0 D N IN delivers NP D D N IN N delivers NP IN delivers NP jj a talk in N jj jj a a talk in N talk in N Arg. 1 formal style formal style Arg.M formal style These are Semantic Structures

  11. In other words we consider… S N VP Paul V NP PP D N IN NP delivers jj a in N talk Arg. 1 formal style

  12. Sub-Categorization Kernel (SCF) [Moschitti, ACL 2004] S VP N Paul V NP PP Arg. 0 D N IN delivers NP Predicate jj a talk in N Arg. 1 formal style Arg. M

  13. Experiments on Gold Standard Trees PropBank and PennTree bank about 53,700 sentences Sections from 2 to 21 train., 23 test., 1 and 22 dev. Arguments from Arg0 to Arg5, ArgA and ArgM for a total of 122,774 and 7,359 FrameNet and Collins’ automatic trees 24,558 sentences from the 40 frames of Senseval 3 18 roles (same names are mapped together) Only verbs 70% for training and 30% for testing

  14. Argument Classification with Poly Kernel

  15. PropBank Results

  16. Argument Classification on PAT using different Tree Fragment Extractor 0.88 0.85 Accuracy --- 0.83 ST SST 0.80 Linear PT 0.78 0.75 0 10 20 30 40 50 60 70 80 90 100 % Training Data

  17. FrameNet Results ProbBank arguments vs. Semantic Roles

  18. Kernel Engineering: Node marking

  19. Marking Boundary nodes

  20. Node Marking Effect

  21. Different tailoring and marking MMST CMST

  22. Experiments PropBank and PennTree bank about 53,700 sentences Charniak trees from CoNLL 2005 Boundary detection: Section 2 training Section 24 testing PAF and MPAF

  23. Number of examples/nodes of Section 2

  24. Predicate Argument Feature (PAF) vs. Marked PAF (MPAF) [Moschitti et al, ACL-ws-2005]

  25. More general mappings: Semantic structures for re-ranking [Moschitti et al, CoNLL 2006]

  26. Other Shallow Semantic structures [Moschitti and Quarteroni, NAACL 2008] [ ARG1 Antigens] were [ AM − TMP originally] [ rel defined] [ ARG2 as non- self molecules]. [ ARG0 Researchers] [ rel describe] [ ARG1 antigens][ ARG2 as foreign molecules] [ ARGM − LOC in the body]

  27. Shallow Semantic Trees for SST kernel [Moschitti et al, ACL 2007]

  28. Merging of Kernels [ECIR 2007] : Question/Answer Classification Syntactic/Semantic Tree Kernel Kernel Combinations Experiments

  29. Merging of Kernels [Bloehdorn & Moschitti, ECIR 2007 & CIKM 2007]

  30. Merging of Kernels VP VP V V NP NP gives gives D D N N N N a a good talk solid talk

  31. Delta Evaluation is very simple

  32. Question Classification Definition : What does HTML stand for? Description : What's the final line in the Edgar Allan Poe poem "The Raven"? Entity : What foods can cause allergic reaction in people? Human : Who won the Nobel Peace Prize in 1992? Location : Where is the Statue of Liberty? Manner : How did Bob Marley die? Numeric : When was Martin Luther King Jr. born? Organization : What company makes Bentley cars?

  33. Question Classifier based on Tree Kernels Question dataset ( http://l2r.cs.uiuc.edu/~cogcomp/Data/QA/QC/ ) [Lin and Roth, 2005] ) Distributed on 6 categories: Abbreviations, Descriptions, Entity, Human, Location, and Numeric. Fixed split 5500 training and 500 test questions Cross-validation (10-folds) Using the whole question parse trees Constituent parsing Example “ What is an offer of direct stock purchase plan ? ”

  34. Kernels BOW, POS are obtained with a simple tree, e.g. BOX an What an is offer … * * * * * PT (parse tree) PAS (predicate argument structure)

  35. Question classification

  36. Similarity based on WordNet

  37. Question Classification with S/STK

  38. Multiple Kernel Combinations [Moschitti, CIKM 2008; Moschitti & Quarteroni, NAACL 2008; Moschitti et al., ACL 2007]

  39. TASK: Question/Answer Classification The classifier detects if a pair (question and answer) is correct or not A representation for the pair is needed The classifier can be used to re-rank the output of a basic QA system

  40. Dataset 2: TREC data 138 TREC 2001 test questions labeled as “description” 2,256 sentences, extracted from the best ranked paragraphs (using a basic QA system based on Lucene search engine on TREC dataset) 216 of which labeled as correct by one annotator

  41. Dataset 2: TREC data 138 TREC 2001 test questions labeled as “description” 2,256 sentences, extracted from the best ranked A question is linked to many answers: all its derived paragraphs (using a basic QA system based on pairs cannot be shared by training and test sets Lucene search engine on TREC dataset) 216 of which labeled as correct by one annotator

  42. Bags of words (BOW) and POS-tags (POS) To save time, apply STK to these trees: BOX … an What is of offer * * * * * BOX DT IN WHNP VBZ NN … * * * * *

  43. Word and POS Sequences What is an offer of…? (word sequence, WSK )  What_is_offer  What_is WHNP VBZ DT NN IN…(POS sequence, POSSK )  WHNP_VBZ_NN  WHNP_NN_IN

  44. Syntactic Parse Trees (PT)

  45. Predicate Argument Classification In an event: target words describe relation among different entities the participants are often seen as predicate's arguments. Example: Paul gives a lecture in Rome

  46. Predicate Argument Classification In an event: target words describe relation among different entities the participants are often seen as predicate's arguments. Example: [ Arg0 Paul] [ predicate gives ] [ Arg1 a lecture] [ ArgM in Rome]

  47. Predicate Argument Structure for Partial Tree Kernel (PAS PTK ) [ ARG1 Antigens] were [ AM − TMP originally] [ rel defined] [ ARG2 as non- self molecules]. [ ARG0 Researchers] [ rel describe] [ ARG1 antigens][ ARG2 as foreign molecules] [ ARGM − LOC in the body]

  48. Kernels and Combinations Exploiting the property: k(x,z) = k 1 (x,z)+k 2 (x,z) BOW, POS, WSK, POSSK, PT, PAS PTK ⇒ BOW+POS, BOW+PT, PT+POS, …

  49. Results on TREC Data (5 folds cross validation) 40 38 36 34 F1-measure 32 30 28 26 24 22 20 Kernel Type

  50. Results on TREC Data (5 folds cross validation) 40 38 36 34 F1-measure 32 30 28 26 24 22 20 Kernel Type

  51. Results on TREC Data (5 folds cross validation) 40 38 36 34 F1-measure 32 30 28 26 24 22 20 Kernel Type

  52. Results on TREC Data (5 folds cross validation) 40 38 36 34 F1-measure 32 30 28 26 24 22 20 Kernel Type

  53. Results on TREC Data (5 folds cross validation) 40 38 36 34 F1-measure 32 30 28 26 24 22 20 Kernel Type

  54. Results on TREC Data (5 folds cross validation) 40 38 36 34 F1-measure 32 30 28 26 24 22 20 Kernel Type

  55. Results on TREC Data (5 folds cross validation) 40 38 36 BOW ≈ 24 34 F1-measure 32 POSSK+STK+PAS-PTK ≈ 39 30 28 ⇒ 62 % of improvement 26 24 22 20 Kernel Type

  56. SVM-light-TK Software Encodes ST, SST and combination kernels in SVM-light [Joachims, 1999] Available at http://dit.unitn.it/~moschitt/ Tree forests, vector sets New extensions: the PT kernel will be released asap

Recommend


More recommend