I mproving a ccuracy of SMS based FAQ r etrieval From: Delhi T echnological University (DTU), Formerly known as Delhi College of Engineering (DCE) Anwar Shaikh, Mukul Jain, Mukul Rawat, Rajiv FIRE Ratn 2011
Outline Abstract Introduction Prior Work Our Contribution Proposed System Problem Formulation T ools Used Results Future Work References FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
Abstract We Implemented an automatic SMS-based question answering system for SMS users as proposed by L. Venkata Subramanium and team in their paper SMS based interface for FAQ retrieval (2009) We are Presenting three techniques to improve the accuracy of SMS based FAQ Retrieval that does not require any training data or SMS normalization It can handle syntactic and semantic variations in question formulation with more accuracy FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
Introduction Amazing growing rate of number of mobile users. Anytime anywhere access provided by mobile networks. This encouraged service providers to build information services based on SMS technology Existing systems require either SMSes in some particular format or the intervention of human in the query response Proposed Automatic system would provide user an independence to write query without any format and the system would produce response without the intervention of human FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
Prior Work SMS questions poses significant challenges due to the inherent noise in it Handle the noise in a SMS query by formulating the query similarity over FAQ questions as a combinatorial search problem The Search space consists of combinations of all possible dictionary variations of tokens in the noisy query Scoring function based on similarity FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
Calculation of Similarity Score FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
Our Contribution New Scoring function Proximity Score based on Proximity Search Length Score based on length of FAQ and SMS query Consider FAQ answers as well for finding closest similar question with respect to SMS query FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
Proposed System Step 1: Preprocessing- Indexing for FAQ questions and answers Create Domain and Synonym Dictionary Remove Stop Words Remove Punctuation Symbols Convert number to word (e.g 4get to fourget) Step 2: Calculate the Similarity Score Calculations of Similarity Measure Calculations of Inverse Document Frequency Step 3: Calculate the Proximity Score Step 4: Calculate the Length Score Step 5: Result- If match is found then return the result Else Look for the FAQ answer for matching Step 6: If still there is not any match then check Out of Domain logic for query FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
Problem Formulation Score (Q) = W1* Similarity_Score(Q, S) + W2* Proximity_Score(Q, S) – W3* Length_Score(Q, S) Where Q is the FAQ question under consideration and S = {s1, s2, …,sn} is the SMS query W1 + W2 = 1.0 (or 100%). W3 is assigned comparatively less value FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
E.g. of Proximity Search SMS: “ wt is captl f india?” FAQ 1: “ What is the capital of UP? It is situated which part of India?” Answer: “ Lucknow is the capital of UP and It is in the north part of India.” FAQ 2: “ What is the capital of India?” Answer: “ Delhi is the capital of India.” FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
Formulation of Proximity Score Click to edit Master text styles Second level ● Third level ● Fourth level ● Fifth level FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
Calculating Proximity Score Click to edit Master text styles Second level ● Third level ● Fourth level ● Fifth level FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
E.g. of Length Search SMS: “ wt is captl f india?” FAQ 1: “ What is the capital of UP? It is situated which part of India? What is the culture of UP?” Answer: “ Lucknow is the capital of UP and It is in the north part of India. Peoples are very friendly in nature.” FAQ 2: “ What is the capital of India?” Answer: “ Delhi is the capital of India.” FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
Length Score (Negative) Click to edit Master text styles Second level ● Third level ● Fourth level ● Fifth level FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
Limitations of Length Score There is a drawback of using Length Score when a question having more number of tokens would always have less overall score because there are more number of unmatched FAQ tokens. FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
Solution 1: Rewrite FAQ FAQ Question: “DTU offers various M Tech courses. What are the Internship opportunities for M Tech students at DTU? There are many M Tech students in DTU. Do all M Tech students get the Internship offer?” Corresponding Small Question: “What are Internship opportunities for M Tech students at DTU?” FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
Solution 2: Length Score (Positive) Click to edit Master text styles Second level ● Third level ● Fourth level ● Fifth level FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
Matching with Answers This will be used only when- There is more than one FAQ-question having the closest matching with the SMS query. There is no matching FAQ-question found. FAQ : “ What are the different insurance schemes?” Answer : “ LIC, LIC JivanSaral, LIC JivanTarang, LIC Plus, Bajaj Allianz, ICICI Lombard etc are different insurance schemes.” SMS : “wht r difrnt LIC scems?” FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
Results Statistics for Hindi task: ***** FIRE 2011 SMS TASK EVALUATION REPORT ***** No. of In-domain Queries :200 No. of Out of Domain Queries:124 In Domain correct:198/200 (0.99) Out of Domain correct:3/124 (0.024193548) Mean Reciprocal Rank (MRR): 0.99 Statistics for FAQ database in English: ***** FIRE 2011 SMS TASK EVALUATION REPORT ***** No. of In-domain Queries :704 No. of Out of Domain Queries:2701 In Domain correct:539/704 (0.765625) Out of Domain correct:871/2701 (0.32247317) Mean Reciprocal Rank (MRR): 0.8309513 FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
T ools Used Lucene 1 Wordnet English 2 Wordnet Hindi 3 FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
Future Work Stemming Automatic Spelling Checker Rewriting FAQ Improve Proximity Search FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
References Govind Kothari, Sumit Negi, T anveer A. Faruquie, Venkaesan T. Chakaravarthy, L. Venkata Subramaniam. 2009. SMS based Interface for FAQ Retrieval. ACL and AFNLP Suntec, Singapore. Danish Cotractor, Govind Kothari, T anveer A. Faruquie, L. Venkata Subramaniam, Sumit Negi. 2010. Handling Noisy Queries In Cross Language FAQ Retrieval. ACL MIT, Massachusetts, USA. [1] http://lucene.apache.org/ [2] http://wordnet.princeton.edu/ [3] http://www.cfilt.iitb.ac.in/wordnet/webhwn/ FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval
Thanks!
Recommend
More recommend