Question Classification in English-Chinese Cross-Language Question - PowerPoint PPT Presentation

Question Classification in English-Chinese Cross-Language Question Answering: An Integrated Genetic Algorithm and Machine Learning Approach Min-Yuh Day 1, 2 , Chorng-Shyong Ong 2 , and Wen-Lian Hsu 1,* , Fellow, IEEE 1 Institute of Information Science, Academia Sinica, Taiwan 2 Department of Information Management , National Taiwan University, Taiwan { myday, hsu} @iis.sinica.edu.tw; ongcs@im.ntu.edu.tw 1/ IEEE IRI 2007, Las Vegas, Nevada, USA, Aug 13-15, 2007.

Outline � Introduction � Research Background � Methods � Hybrid GA-CRF-SVM Architecture � Experimental Design � Experimental Results and Discussion � Conclusions 2/ Min-Yuh Day (NTU; SINICA)

Introduction � Question classification (QC) plays an important role in cross-language question answering (CLQA) � QC: Accurately classify a question in to a question type and then map it to an expected answer type � “ What is the biggest city in the United States? ” � Question Type: “ Q_LOCATION_CITY ” � Extract and filter answers in order to improve the overall accuracy of a cross-language question answering system 3/ Min-Yuh Day (NTU; SINICA)

Introduction (cont.) � Question informers (QI) play a key role in enhancing question classification for factual question answering � QI: Choosing a minimal, appropriate contiguous span of a question token, or tokens, as the informer span of a question, which is adequate for question classification. � “ What is the biggest city in the United States? ” � Question informer: “ city ” � “ city ” is the most important clue in the question for question classification. 4/ Min-Yuh Day (NTU; SINICA)

Introduction (cont.) � Feature Selection in Machine Learning � Optimization problem that involves choosing an appropriate feature subset. � Hybrid approach that integrates Genetic Algorithm (GA) and Conditional Random Fields (CRF) improves the accuracy of question informer prediction in traditional CRF models (Day et al., 2006) � We propose an integrated Genetic Algorithm (GA) and Machine Learning (ML) approach for question classification in cross-language question answering. 5/ Min-Yuh Day (NTU; SINICA)

Research Background � Cross Language Question Answering � International Question Answering (QA) contests � TREC QA: 1999~ � Monolingual QA in English � QA@CLEF: 2003~ � European languages in both non-English monolingual and cross-language � NTCIR CLQA: 2005~ � Asian languages in both monolingual and cross-language � Question Classification � Rule-based method � Machine Learning based method 6/ Min-Yuh Day (NTU; SINICA)

Research Background (cont.) � Two strategies for question classification in English-Chinese cross-language question answering 1) Chinese QC (CQC) for both English and Chinese queries. � English source language has to be translated into the Chinese target language in advance. 2) English QC (EQC) for English queries and Chinese QC (CQC) for Chinese queries. � We focus on question classification in English- Chinese cross-language question answering � Bilingual QA system for English source language queries and Chinese target document collections. 7/ Min-Yuh Day (NTU; SINICA)

Methods � Hybrid GA-CRF-SVM Architecture � GA for CRF Feature Selection � GA-CRF Question Informer Prediction � SVM-based Question Classification using GA- CRF Question Informer 8/ Min-Yuh Day (NTU; SINICA)

Hybrid GA-CRF-SVM Architecture Question GA : Feature Selection GA GA for CRF Feature Selection Near Optimal Feature Subset of CRF GA-CRF Question Informer Prediction CRF Near Optimal CRF Prediction Model CRF-based Question Informer Prediction SVM-based Question Classification SVM Question Informer SVM-based Question Classification Question Type

GA-CRF Encoding a Feature Subset Learning of CRF with the structure of chromosomes Initialization Training dataset Population x: Feature subset CRF model Evaluate 10-fold Cross Validation (Fitness Function) F(x):Fitness Function Yes Stopping criteria Satisfied? No GA Operators: Near Optimal Feature Subset of CRF Reproduction, Crossover, Mutation Near Optimal CRF Prediction Model Test CRF-based Question Informer Prediction dataset GA-CRF Question Informer Prediction

Experiment Design � Data set for English Question Classification � Training dataset (5288E) � 4,204 questions from UIUC QC dataset (E) � + 500 questions from the NTCIR-5 CLQA development set (E) � + 200 questions from the NTCIR-5 CLQA test set (E) � + 384 questions from TREC2002 questions (E) � Test dataset (CLQA2T150E) � 150 English questions from NTCIR-6 CLQA ’ s formal run � Data set for Chinese Question Classification � Training dataset (2322C) � 1238 question from IASL (C) � + 500 questions from the NTCIR-5 CLQA development set (C) � + 200 questions from the NTCIR-5 CLQA test set (C) � + 384 questions from TREC2002 questions (translated) (C) � Test dataset (CLQA2T150C) � 150 Chinese questions from NTCIR-6 CLQA ’ s formal run 11/ Min-Yuh Day (NTU; SINICA)

Experiment Design (cont.) � Features for English Question Classification � Syntactic features � Word-based bi-grams of the question (WB) � First word of the question (F1) � First two words of the question (F2) � Wh-word of the question (WH) � i.e., 6W1H1O: who, what, when, where, which, why, how, and other � Semantic features � Question informers predicted by the GA-CRF model (QIF) � Question informer bi-grams predicted by the GA- CRF model (QIFB) 12/ Min-Yuh Day (NTU; SINICA)

Experiment Design (cont.) � Features for Chinese Question Classification � Syntactic features � Bag-of-Words � character-based bi-grams (CB) � word-based bi-grams (WB). � Part-of-Speech (POS) � Semantic Features � HowNet Senses � HowNet Main Definition (HNMD) � HowNet Definition (HND). � TongYiCi CiLin (TYC) 13/ Min-Yuh Day (NTU; SINICA)

Experiment Design (cont.) � Performance Metrics � Accuracy Number of corrected question types Accuracy = Total number of questions � MRR (mean reciprocal rank) M 1 1 ∑ = MRR M rank = i 1 i where rank i is the rank of the first corrected question type of the i th question, and M is total number of questions. 14/ Min-Yuh Day (NTU; SINICA)

Experimental Results � Question informer prediction � Using GA to optimize the selection of the feature subset in CRF-based question informer prediction improves the F-score from 88.9% to 93.87%, and reduces the number of features from 105 to 40. � Training dataset (UIUC Q5500) � Test dataset (UIUC Q500) � The accuracy of our proposed GA-CRF model for the UIUC dataset is 95.58% compared to 87% for the traditional CRF model reported by Krishnan et al.(2005) � The proposed hybrid GA-CRF model for question informer prediction significantly outperforms the traditional CRF model. 15/ Min-Yuh Day (NTU; SINICA)

Experimental Results � English Question Classification (EQC) using SVM English Question Classification 100.00% 95.33% 95.33% 92.00% 89.33% 94.00% 94.00% 88.67% 90.00% 90.67% 86.67% 86.00% Accuracy 80.00% 70.00% 60.00% B H F B 2 I F W W F Q + I + + Q 1 B H + F W W F + B I Q + F B + I Q W H W + F + I B Q W + H W + B W Top 1 Accuracy (Fine) Top 1 Accuracy (Coarse) 16/ Min-Yuh Day (NTU; SINICA)

Experimental Results of Chinese Question Classification (CQC) using SVM with different features Top 1 Top 1 Top 5 Top 5 Feature Accuracy Accuracy MRR MRR Used (Fine) (Coarse) (Fine) (Coarse) POS 53.33% 65.33% 0.5732 0.7533 POSB 60.00% 74.00% 0.6469 0.7970 HNMD 71.33% 81.33% 0.7480 0.8832 CB 74.00% 84.67% 0.7934 0.9130 HNMDB 74.00% 86.00% 0.7916 0.9117 C 74.67% 84.67% 0.7979 0.9152 TYCB 74.67% 86.00% 0.7880 0.9062 HND 74.67% 86.67% 0.7860 0.9102 W 76.00% 88.00% 0.7901 0.9208 HNDB 76.67% 88.00% 0.8000 0.9162 WB 77.33% 88.00% 0.8067 0.9162 TYC 77.33% 88.67% 0.8019 0.9240 17/ Min-Yuh Day (NTU; SINICA)

Experimental Results (cont.) � Chinese Question Classification (CQC) using SVM Chinese Question Classification 95.00% 90.67% 89.33% 88.67% 90.00% 84.67% 85.00% 80.00% Accuracy 78.00% 77.33% 76.67% 75.00% 74.00% 70.00% 65.00% 60.00% CB CB+HNMD CB+HNMD+HND CB+HNMD+HND+TYC Top 1 Accuracy (Fine) Top 1 Accuracy (Coarse) 18/ Min-Yuh Day (NTU; SINICA)

Conclusions � We have proposed a hybrid genetic algorithm and machine learning approach for cross-language question classification. � The major contribution of this paper is that the proposed approach enhances cross-language question classification by using the GA-CRF question informer feature with Support Vector Machines (SVM). � The results of experiments on NTCIR-6 CLQA question sets demonstrate the efficacy of the approach in improving the accuracy of question classification in English-Chinese cross-language question answering. 19/ Min-Yuh Day (NTU; SINICA)

Question Classification in English-Chinese Cross-Language Question - PowerPoint PPT Presentation

Question Classification in English-Chinese Cross-Language Question Answering: An Integrated Genetic Algorithm and Machine Learning Approach Min-Yuh Day 1, 2 , Chorng-Shyong Ong 2 , and Wen-Lian Hsu 1,* , Fellow, IEEE 1 Institute of Information

4 English I CP or Honors Credits English II CP or Honors of English III CP or

WELCOME CHINESE Your Access Channel to the Chinese Market Welcome Chinese mission statement

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

GCSE English Language Year 10 Entry 1 Key Information English Language and English

CORE PRESENTATION EVENING English Language and English Literature ENGLISH LANGUAGE GCSE What

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

International Conference 2014 Solo, 7 - 9 October 2014 PROCEEDINGS English Language Curriculum

The Bible and the Chinese Language Chinese Language -Is the oldest, continuous written language

The English Language A Living Language: evolving for 1500 years and counting What is English?

GCSE English Language & Literature GCSE English Language and GCSE English Literature

ENGLISH CHOICES AT WHEATLEY AN INTRODUCTION FOR NINTH GRADERS AND THEIR PARENTS ENGLISH

Classification K-nearest neighbor classification D istance functions Choice of k Choice of k

Question Classification Ling573 NLP Systems and Applications April 22, 2014 Roadmap

ENGLISH ENGLISH quali qualify me f fy me for? or? They graduated in English Emma Watson

GCSE English Language & Literature EXAM DATES English Literature English Language Paper 1

Studying English Literature, Language, Creative Writing and Drama at Salford Jane Kilby

Interaction Example (Recap) General Specific a(requester, A) ::= a(requester, a1) ::= ask(X)

Submodular Observation Selection and Information Gathering for Quadratic Models Abolfazl Hashemi

Cache Logistics Trust Underwritten and Renounceable Rights Issue 11 September 2017 Important

Calling Variadic Functions from a Strongly Typed Language Matthias Blume Toyota Technological

ASQA2 Academia Sinica Question Answering System on C-C and E-C Subtasks Cheng-Wei Lee ,

Logics for Classical and Quantum Information Flow Sonja Smets (ILLC, University of Amsterdam)

July 2014 Masashige Mizuyama CTO of Automotive Infotainment Business Division, Panasonic Corp.

Starting Over a Flop T H E E L E C T R I C C A R I S Too ugly No clear buyers No

Question Classification in English-Chinese Cross-Language Question - PowerPoint PPT Presentation

Question Classification in English-Chinese Cross-Language Question Answering: An Integrated Genetic Algorithm and Machine Learning Approach Min-Yuh Day 1, 2 , Chorng-Shyong Ong 2 , and Wen-Lian Hsu 1,* , Fellow, IEEE 1 Institute of Information

4 English I CP or Honors Credits English II CP or Honors of English III CP or

WELCOME CHINESE Your Access Channel to the Chinese Market Welcome Chinese mission statement

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

GCSE English Language Year 10 Entry 1 Key Information English Language and English

CORE PRESENTATION EVENING English Language and English Literature ENGLISH LANGUAGE GCSE What

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

International Conference 2014 Solo, 7 - 9 October 2014 PROCEEDINGS English Language Curriculum

The Bible and the Chinese Language Chinese Language -Is the oldest, continuous written language

The English Language A Living Language: evolving for 1500 years and counting What is English?

GCSE English Language &amp; Literature GCSE English Language and GCSE English Literature

ENGLISH CHOICES AT WHEATLEY AN INTRODUCTION FOR NINTH GRADERS AND THEIR PARENTS ENGLISH

Classification K-nearest neighbor classification D istance functions Choice of k Choice of k

Question Classification Ling573 NLP Systems and Applications April 22, 2014 Roadmap

ENGLISH ENGLISH quali qualify me f fy me for? or? They graduated in English Emma Watson

GCSE English Language &amp; Literature EXAM DATES English Literature English Language Paper 1

Studying English Literature, Language, Creative Writing and Drama at Salford Jane Kilby

Interaction Example (Recap) General Specific a(requester, A) ::= a(requester, a1) ::= ask(X)

Submodular Observation Selection and Information Gathering for Quadratic Models Abolfazl Hashemi

Cache Logistics Trust Underwritten and Renounceable Rights Issue 11 September 2017 Important

Calling Variadic Functions from a Strongly Typed Language Matthias Blume Toyota Technological

ASQA2 Academia Sinica Question Answering System on C-C and E-C Subtasks Cheng-Wei Lee ,

Logics for Classical and Quantum Information Flow Sonja Smets (ILLC, University of Amsterdam)

July 2014 Masashige Mizuyama CTO of Automotive Infotainment Business Division, Panasonic Corp.

Starting Over a Flop T H E E L E C T R I C C A R I S Too ugly No clear buyers No

GCSE English Language & Literature GCSE English Language and GCSE English Literature

GCSE English Language & Literature EXAM DATES English Literature English Language Paper 1