advanced natural language processing and information
play

Advanced Natural Language Processing and Information Retrieval - PowerPoint PPT Presentation

Advanced Natural Language Processing and Information Retrieval LAB3: Kernel Methods for Reranking Alessandro Moschitti Department of Computer Science and Information Engineering University of Trento Email: moschitti@disi.unitn.it Preference


  1. Advanced Natural Language Processing and Information Retrieval LAB3: Kernel Methods for Reranking Alessandro Moschitti Department of Computer Science and Information Engineering University of Trento Email: moschitti@disi.unitn.it

  2. Preference Reranking slides at: http://disi.unitn.it/moschitti/teaching.html 2

  3. 15.4.2 The Ranking SVM [Herbrich et al. 1999, 2000; Joachims et al. 2002] � The aim is to classify instance pairs as correctly ranked or incorrectly ranked � This turns an ordinal regression problem back into a binary classification problem � We want a ranking function f such that x i > x j iff f ( x i ) > f ( x j ) � … or at least one that tries to do this with minimal error � Suppose that f is a linear function f ( x i ) = w Ÿ x i

  4. • Sec. 15.4.2 The Ranking SVM � Ranking Model: f ( x i ) f ( x i )

  5. • Sec. 15.4.2 The Ranking SVM � Then (combining the two equations on the last slide): x i > x j iff w Ÿ x i − w Ÿ x j > 0 x i > x j iff w Ÿ ( x i − x j ) > 0 � Let us then create a new instance space from such pairs: z k = x i − x k y k = +1, − 1 as x i ≥ , < x k

  6. Support Vector Ranking   w || + C � m 1 i =1 ξ 2  2 || � min i   y k ( � w · ( � x j ) + b ) ≥ 1 − ξ k , ∀ i, j = 1 , .., m x i − � (2 k = 1 , .., m 2 ξ k ≥ 0 ,   y k = 1 if rank ( � x i ) > rank ( � x j ), 0 otherwise, where k = i × m + j − 1 � Given two examples we build one example ( x i , x j )

  7. Framework of Preference Reranking Local Model � The local model is a system providing the initial rank � Preference reranking is superior to ranking with an instance classifier since it compares pairs of hypotheses 7

  8. More formally � Build a set of hypotheses: Q and A pairs � These are used to build pairs of pairs, i , H j H � positive instances if H i is correct and H j is not correct � A binary classifier decides if H i is more probable than H j � Each candidate annotation H i is described by a structural representation � This way kernels can exploit all dependencies between features and labels 8

  9. Preference Reranking Kernel H 1 > H 2 and H 3 > H 4 then consider training vectors:   Z 1 = φ ( H 1 ) − φ ( H 2 ) and Z 2 = φ ( H 3 ) − φ ( H 4 ) ⇒ the dot product is:   ( ) • φ ( H 3 ) − φ ( H 4 ) ( ) = Z 1 • Z 2 = φ ( H 1 ) − φ ( H 2 ) φ ( H 1 ) • φ ( H 3 ) − φ ( H 1 ) • φ ( H 4 ) − φ ( H 2 ) • φ ( H 3 ) + φ ( H 2 ) • φ ( H 4 ) = K ( H 1 , H 3 ) − K ( H 1 , H 4 ) − K ( H 2 , H 3 ) + K ( H 2 , H 4 ) Let H i = q i , a i , H j = q j , a j K ( H i , H j ) = PTK ( q i , q j ) + PTK ( a i , a j ) 9

  10. 10

  11. An example of Jeopardy! Question

  12. Adding Relational Links Question Answer

  13. !"#$%&'(

  14. Links can be encoded marking tree nodes Methodology: 1-Applying lemmatization or stemming to the leaves 2-Mark (with @ symbol) pre-terminal nodes and higher level nodes if the subtrees are shared in Q and A 3-Ignore stop words in the matching Question procedure Answer

  15. !"#$%&'( 16

  16. Representation Issues � Very large sentences � The Jeopardy! cues can be constituted by more than one sentence � The answer is typically composed by several sentences � Too large structures cause inaccuracies in the kernel similarity and the learning algorithm looses some of its power 17

  17. Running example from Answerbag Question : Is movie theater popcorn vegan? Answer : (01) Any movie theater popcorn that includes butter -- and therefore dairy products -- is not vegan. (02) However, the popcorn kernels alone can be considered vegan if popped using canola, coconut or other plant oils which some theaters offer as an alternative to standard popcorn. 18

  18. Shallow models for Reranking: [Severyn & Moschitti, SIGIR 2012] Ques%on SQ bag of pos tags VBZ NN NN JJ NN and their combina3on bag of words is movie theater popcorn vegan (is) (movie) (theater) (popcorn) (vegan) (VBZ) (NN) (NN) (JJ) (NN) Answer S DT NN NN NN WDT VBZ NN CC RB JJ NNS VBZ RB NN any movie theater popcorn that includes butter and therefore dairy products is not vegan (any) (movie) (theater) (popcorn) (that) (includes) (bu:er) (and) (therefore) (dairy) (products) (is) (not) (vegan) (DT) (NN) (NN) (NN) (WDT) (VBZ) (NN) (CC) (RB) (JJ) (NNS) (VBZ) (RB) (NN) 19

  19. Linking question with the answer 01 Lexical matching is on word lemmas (using Ques3on sentence WordNet lemma3zer) SQ VBZ NN NN JJ NN is movie theater popcorn vegan Answer Passage S DT NN NN NN WDT VBZ NN CC RB JJ NNS VBZ RB NN any movie theater popcorn that includes butter and therefore dairy products is not vegan S RB DT JJ NNS RB MD VB VBN NN IN VBN VBG NN NN CC JJ NN NNS WDT DT NNS VBP IN DT NN TO JJ NN however the popcorn kernels alone can be considered vegan if popped using canola coconut or other plant oils which some theaters offer as an alternative to standard popcorn 20

  20. Linking question with the answer 02 Lexical matching is on word lemmas (using Ques3on sentence WordNet lemma3zer) SQ VBZ NN NN JJ NN is movie theater popcorn vegan S DT NN NN NN WDT VBZ NN CC RB JJ NNS VBZ RB NN any movie theater popcorn that includes butter and therefore dairy products is not vegan Answer Passage S RB DT JJ NNS RB MD VB VBN NN IN VBN VBG NN NN CC JJ NN NNS WDT DT NNS VBP IN DT NN TO JJ NN however the popcorn kernels alone can be considered vegan if popped using canola coconut or other plant oils which some theaters offer as an alternative to standard popcorn 21

  21. Linking question and its answer passages using a relational tag Marking pos tags of the aligned words by a rela3onal tag: “REL” SQ REL-VBZ REL-NN REL-NN REL-JJ REL-NN is movie theater popcorn vegan S DT REL-NN REL-NN REL-NN WDT VBZ NN CC RB JJ NNS REL-VBZ RB REL-NN any movie theater popcorn that includes butter and therefore dairy products is not vegan 22

  22. Let’s start the LAB3: Ranking with Tree Kernels 23

  23. SVM-light-TK and Ranking Data � SVM-light-TK encodes STK, PTK and combination kernels in SVM-light [Joachims, 1999] � http://disi.unitn.it/moschitti/teaching.html � Academic Year: 2015-2016 � Download: LAB3.zip 24

  24. Compile the package � Go under SVM directory � cd SVM-Light-1.5-rer/ � Type make to build the code � make � Go back to the previous directory � cd .. 25

  25. Generating examples for reranking � questions.5k.txt , contains a set of questions � each line contains a unique id and the question itself separated by a tab, i.e., "\t” answers.txt -- contains a set of answers � � each line contains a unique id and the answer passage itself separated by a tab, i.e. "\t” 26

  26. Training and testing files � results.*.15k , a rank list for 1,000 questions (contained in questions.5k.txt) (1) the id of the question (2) the id of the passage, and (3) its score from the search engine � results.train.15k, results.test.15k � 1000 questions � 15 retrieved passages for each question � ( BOX (the) (cell) (phone) (used) (tony) (stark) (the) (movie) (iron) (man) (was) (vx9400) (slider) (phone) (which) (was) (just) (one) (the) (mobile) (phones) (used) (the) (movie.)) 27

  27. Building the reranker files � Generate training examples for reranking � python generate_reranking_pairs.py questions. 5k.txt answers.txt results.train.15k � python2.7 � Generate testing examples for reranking � python generate_reranking_pairs.py -m test questions.5k.txt answers.txt results.test.15k 28

  28. Retrieval results for a question 2500744 What kind of cell phone was used in the movie � "Iron Man"? 2500744 The cell phone used by Tony Stark in the movie � "Iron Man" was a LG VX9400 slider phone, which was just one of the LG mobile phones used in the movie. 2259459 The average person cannot trace a prepaid cell � phone; however, the federal government and police force do have this capability. While they cannot determine a person's exact location, they can find what cell phone towers are being used and use this information to trace the phone. 29

  29. Generated learning files: svm.train +1 |BT| (BOX (the) (cell) (phone) (used) (tony) (stark) (the) � (movie) (iron) (man) (was) (vx9400) (slider) (phone) (which) (was) (just) (one) (the) (mobile) (phones) (used) (the) (movie.)) |BT| (BOX (the) (average) (person) (cannot) (trace) (prepaid) (cell) (phone) (however) (the) (federal) (government) (and) (police) (force) (have) (this) (capability.) (while) (they) (cannot) (determine) (person) (exact) (location) (they) (can) (find) (what) (cell) (phone) (towers) (are) (being) (used) (and) (use) (this) (information) (trace) (the) (phone.)) |ET| 1:2.28489184 |BV| 1:0.65760440 |EV| 30

  30. Generated test files: svm.test +1 |BT| (BOX (the) (cell) (phone) (used) (tony) (stark) (the) � (movie) (iron) (man) (was) (vx9400) (slider) (phone) (which) (was) (just) (one) (the) (mobile) (phones) (used) (the) (movie.)) |BT| EMPTY |ET| 1:2.28489184 |BV| EMPTY |EV| 31

Recommend


More recommend