i mproving a ccuracy of sms based faq r etrieval
play

I mproving a ccuracy of SMS based FAQ r etrieval From: Delhi T - PowerPoint PPT Presentation

I mproving a ccuracy of SMS based FAQ r etrieval From: Delhi T echnological University (DTU), Formerly known as Delhi College of Engineering (DCE) Anwar Shaikh, Mukul Jain, Mukul Rawat, Rajiv FIRE Ratn 2011 Outline Abstract Introduction


  1. I mproving a ccuracy of SMS based FAQ r etrieval From: Delhi T echnological University (DTU), Formerly known as Delhi College of Engineering (DCE) Anwar Shaikh, Mukul Jain, Mukul Rawat, Rajiv FIRE Ratn 2011

  2. Outline Abstract Introduction Prior Work Our Contribution Proposed System Problem Formulation T ools Used Results Future Work References FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  3. Abstract We Implemented an automatic SMS-based question answering system for SMS users as proposed by L. Venkata Subramanium and team in their paper SMS based interface for FAQ retrieval (2009) We are Presenting three techniques to improve the accuracy of SMS based FAQ Retrieval that does not require any training data or SMS normalization It can handle syntactic and semantic variations in question formulation with more accuracy FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  4. Introduction Amazing growing rate of number of mobile users. Anytime anywhere access provided by mobile networks. This encouraged service providers to build information services based on SMS technology Existing systems require either SMSes in some particular format or the intervention of human in the query response Proposed Automatic system would provide user an independence to write query without any format and the system would produce response without the intervention of human FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  5. Prior Work SMS questions poses significant challenges due to the inherent noise in it Handle the noise in a SMS query by formulating the query similarity over FAQ questions as a combinatorial search problem The Search space consists of combinations of all possible dictionary variations of tokens in the noisy query Scoring function based on similarity FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  6. Calculation of Similarity Score FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  7. Our Contribution New Scoring function Proximity Score based on Proximity Search Length Score based on length of FAQ and SMS query Consider FAQ answers as well for finding closest similar question with respect to SMS query FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  8. Proposed System Step 1: Preprocessing- Indexing for FAQ questions and answers Create Domain and Synonym Dictionary Remove Stop Words Remove Punctuation Symbols Convert number to word (e.g 4get to fourget) Step 2: Calculate the Similarity Score Calculations of Similarity Measure Calculations of Inverse Document Frequency Step 3: Calculate the Proximity Score Step 4: Calculate the Length Score Step 5: Result- If match is found then return the result Else Look for the FAQ answer for matching Step 6: If still there is not any match then check Out of Domain logic for query FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  9. Problem Formulation Score (Q) = W1* Similarity_Score(Q, S) + W2* Proximity_Score(Q, S) – W3* Length_Score(Q, S) Where Q is the FAQ question under consideration and S = {s1, s2, …,sn} is the SMS query W1 + W2 = 1.0 (or 100%). W3 is assigned comparatively less value FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  10. E.g. of Proximity Search SMS: “ wt is captl f india?” FAQ 1: “ What is the capital of UP? It is situated which part of India?” Answer: “ Lucknow is the capital of UP and It is in the north part of India.” FAQ 2: “ What is the capital of India?” Answer: “ Delhi is the capital of India.” FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  11. Formulation of Proximity Score Click to edit Master text styles Second level ● Third level ● Fourth level ● Fifth level FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  12. Calculating Proximity Score Click to edit Master text styles Second level ● Third level ● Fourth level ● Fifth level FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  13. E.g. of Length Search SMS: “ wt is captl f india?” FAQ 1: “ What is the capital of UP? It is situated which part of India? What is the culture of UP?” Answer: “ Lucknow is the capital of UP and It is in the north part of India. Peoples are very friendly in nature.” FAQ 2: “ What is the capital of India?” Answer: “ Delhi is the capital of India.” FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  14. Length Score (Negative) Click to edit Master text styles Second level ● Third level ● Fourth level ● Fifth level FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  15. Limitations of Length Score There is a drawback of using Length Score when a question having more number of tokens would always have less overall score because there are more number of unmatched FAQ tokens. FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  16. Solution 1: Rewrite FAQ FAQ Question: “DTU offers various M Tech courses. What are the Internship opportunities for M Tech students at DTU? There are many M Tech students in DTU. Do all M Tech students get the Internship offer?” Corresponding Small Question: “What are Internship opportunities for M Tech students at DTU?” FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  17. Solution 2: Length Score (Positive) Click to edit Master text styles Second level ● Third level ● Fourth level ● Fifth level FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  18. Matching with Answers This will be used only when- There is more than one FAQ-question having the closest matching with the SMS query. There is no matching FAQ-question found. FAQ : “ What are the different insurance schemes?” Answer : “ LIC, LIC JivanSaral, LIC JivanTarang, LIC Plus, Bajaj Allianz, ICICI Lombard etc are different insurance schemes.” SMS : “wht r difrnt LIC scems?” FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  19. Results Statistics for Hindi task: ***** FIRE 2011 SMS TASK EVALUATION REPORT ***** No. of In-domain Queries :200 No. of Out of Domain Queries:124 In Domain correct:198/200 (0.99) Out of Domain correct:3/124 (0.024193548) Mean Reciprocal Rank (MRR): 0.99 Statistics for FAQ database in English: ***** FIRE 2011 SMS TASK EVALUATION REPORT ***** No. of In-domain Queries :704 No. of Out of Domain Queries:2701 In Domain correct:539/704 (0.765625) Out of Domain correct:871/2701 (0.32247317) Mean Reciprocal Rank (MRR): 0.8309513 FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  20. T ools Used Lucene 1 Wordnet English 2 Wordnet Hindi 3 FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  21. Future Work Stemming Automatic Spelling Checker Rewriting FAQ Improve Proximity Search FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  22. References Govind Kothari, Sumit Negi, T anveer A. Faruquie, Venkaesan T. Chakaravarthy, L. Venkata Subramaniam. 2009. SMS based Interface for FAQ Retrieval. ACL and AFNLP Suntec, Singapore. Danish Cotractor, Govind Kothari, T anveer A. Faruquie, L. Venkata Subramaniam, Sumit Negi. 2010. Handling Noisy Queries In Cross Language FAQ Retrieval. ACL MIT, Massachusetts, USA. [1] http://lucene.apache.org/ [2] http://wordnet.princeton.edu/ [3] http://www.cfilt.iitb.ac.in/wordnet/webhwn/ FIRE 2011: Improving Accuracy of SMS based FAQ 12/3/2011 Retrieval

  23. Thanks!

Recommend


More recommend