FIRE@ISM-2013 Transliterated Search Task Dinesh Kumar Prabhakar - PowerPoint PPT Presentation

FIRE@ISM-2013 Transliterated Search Task Dinesh Kumar Prabhakar Sukomal Pal Department of Computer Science & Engineering Indian School of Mines, Dhanbad, India

Contents ● Introduction ● FIRE Task ● Solution ● Approaches ● Result ● Analysis ● Conclusion ● References 04/12/13 2

Introduction Transliteration: A process of writing a term/phrase/sentence of one language (e.g. Hindi) using script of another language (e.g Roman script as used in English) ( e.g.- yaaron sab dua karo <---> यारोः सब दुआ करो ) Two categories Forward : Phonetic presentation of terms in non-native script ( e.g. Hindi written using Roman script) Backward: Conversion of terms from non-native script to its native script ( e.g. Converting a Hindi phrase written using Roman script back to Devnagari script) 04/12/13 3

FIRE Task ● Task 1: Query Word Labeling – palak paneer recipe (i/p) – palak\H= पालक paneer\H= पनीर recipe\E (o/p) ● Task 2: Multi-script Ad hoc retrieval for Hindi Song Lyrics – Iss pyar ko main kya naam doon – List of song 04/12/13 4

Solution Query Word Labeling Phase 1: Classification – Dictionary based Classification – ML-based Classifier (MaxEnt) Phase 2: Transliteration – List-based 04/12/13 5

Approach-I ● Preprocessing – Assuming English wordlist contains sufficient data – Created 26 different text file (e.g.- a.txt, b.txt, ..., z.txt) ● Phase 1: Classification – List-based ● Phase 2: Transliteration – List-based 04/12/13 6

Algorithm 1. Input term from Test Document 2. Check first letter of term {A-Z,a-z} 3. Match term in corresponding Document 4. if match found 4.1. { Match term in E-H pair Document 4.2. if found 4.2.1. {Print term ,\H, word's native script from E-H pair} 4.3. else 4.3.1 {Print term ,\E}} 5. else 5.1. {Match term in E-H pair Document 5.2. if found 5.2.1. {Print term ,\H,=, native script from E-H pair} 5.3. else 5.3.1. {Print term, \H}} 6. end 04/12/13 7

Results ● Exact query match fraction (EQMF) = #(Quer. for which lang. labels and translits. match exactly)/#(All queries) ● Transliteration precision (TP) = #(Correct transliterations)/#(Generated transliterations) ● Transliteration recall (TR) = #(Correct transliterations)/#(Reference transliterations) ● Transliteration F−score (TF) = 2 × TP × TR/(TP + TR) ● Labelling accuracy (LA) = #(Correct label pairs)/(#(Correct label pairs) + #(Incorrect label pairs)) 04/12/13 8

Results Language Stats Metric ISMDhanbad Maximum Score Median Score Hindi Exact query match 0.0860 0.1980 0.0290 fraction 10 runs Exact 1584/2117 N. A. N. A. transliteration pairs match 5 teams Transliteration- 0.7253 0.8135 0.4486 precision #(True \H) = 2444 Transliteration- 0.6484 0.8125 0.4300 recall #(True \E) = 777 Transliteration- 0.6847 0.8130 0.4260 Fscore #(\N) = 232 Labelling accuracy 0.8780 0.9848 0.9540 N = Names Eng-precision 0.6853 0.9667 0.9302 and ambiguities Eng-recall 0.9138 0.9755 0.9640 excluded from Eng-Fscore 0.7832 0.9685 0.9019 analysis L-precision 0.9693 0.9906 0.9883 L-recall 0.8666 0.9894 0.9791 L-Fscore 0.9151 0.9900 0.9700 04/12/13 9

Analysis ● English wordlist in corpus is considerably high ● Out-of- Dictionary word will be treated as hindi word – (e.g.-peenekeliye\H) ● Named entity may come with correspoding transliterated word if it is in E-H pair file – (e.g.- khusbu khusbu\H= खुशॎबू ) – Why? – Since term is there in E-H pair document ● NER technique used X ● Context consider X 04/12/13 10 ●

Approach-II ● Preprocessing – Annotate “ E ” to english words and “ H ” to hindi term of E-H pair words (e.g.-tera H, khushboo H and good E, apple E) – Train the classifier with these annotion ● Phase 1: Classification – Using this classifier, terms are classified ● Phase 2: Transliteration – List-based 04/12/13 11

Algorithm 1. Input term from Test Document 2. Classify terms into E\H 3. if term is of E class 3.1. {Print term , “\”,class} 4. else 4.1. { match term in E-H pair Document 4.2. if found 4.2.1. {Print term ,class, term's native script from E-H pair} 4.3. else 4.3.1 {Print term ,\,class}} 5. end 04/12/13 12

Analysis bibi\H= बीबी ka\H= का maqbara\E paryatak\H guide\E ● maqbara\E wrongly classified – Why? – Less no of hindi term in training data ( in E-H pair document) ● paryatak\H equivalent transliteration is not here. – Why? – Out-of-dictionary (E-H pair document) 04/12/13 13

Conclusion ● Backward transliteration technique ● Our system has performed better for some of the metrics – (e.g.- EQMF, TP,TR and TF) – Why? – Equvalent transliterations was there ● There are some limitations of this system – (e.g. Named-entity may not be identifiable) – Why? – We haven't used any NER technique ● System may give unwanted transliteration for few terms (e.g.- koee koee/H= क े ) – Why? – Since it is there in E-H pair document 04/12/13 14

References 1. King, B., Abney, S.: Labeling the Languages of Words in Mixed-Language Documents using Weakly Supervised Methods In Proceedings of NAACL-HLT-2013, Atlanta, Georgia (2013) 1110- 1119 2. Gupta, K., Choudhury, M., and Bali, K.: Mining Hindi-English Transliteration Pairs from Online Hindi Lyrics, In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC '12), Istanbul, Turkey (2012) 2459-2465 3. Sowmya, V.B., Choudhury, M., Bali K., Dasgupta, T. and Basu, A.: Resource Creation for Training and Testing of Transliteration Systems for Indian Languages, LREC (2010) 4. Karimi, S., Scholer F., and Turpin, A.: Machine Transliteration Survey. In ACM Computing Surveys (CSUR), Volume 43 Issue 3, New York, USA (2011) 17:1-46 5. Dale, R.: Language Technology. Slides of HCSNet Summer School Course. Sydney (2007) 6. Stanford Classifier v3.2.0 – 2013-06-19 classification tool from Stanford University 04/12/13 15

THANK YOU 04/12/13 16

FIRE@ISM-2013 Transliterated Search Task Dinesh Kumar Prabhakar - PowerPoint PPT Presentation

FIRE@ISM-2013 Transliterated Search Task Dinesh Kumar Prabhakar Sukomal Pal Department of Computer Science & Engineering Indian School of Mines, Dhanbad, India Contents Introduction FIRE Task Solution Approaches Result

Documents Transliterated Queries Transliterated Documents Native script Queries 5 teams, 25

ESPM 134 - -This week: This week: ESPM 134 Fire Suppression Fire Suppression Prescription

SIG ISM WORKSHOP LONDON 2015 Alf Moens SIG ISM The aims of the SIG-ISM are: * Establish a

IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search Irshad Ahmad Bhat

2 Electric Fire Pump 3 Engine fire pump 4 3 Emergency Generator backup 5 Fire Alarm Control

Introductions (EAC) How ISM is organized ISM/APHSAs value to individual members

Speeding up by using ISM-like calls Junji NAKANO (The Institute of Statistical Mathematics, Japan)

DIF SEK PART 4 SOFTWARE FOR FIRE DESIGN DIF SEK Part 4: Software for Fire Design 0 / 47 Fire

Arlington County Fire Department Fire Station #10 Arlington County Fire Department 10 Fire

7/8/2013 1 7/8/2013 2 7/8/2013 3 7/8/2013 4 7/8/2013 5 7/8/2013 6 7/8/2013 7 7/8/2013

FIRE SAFETY Fire Fire is a rapid chemical reaction of oxidant with fuel accompanied by the

West Sand Lake Fire District West Sand Lake Fire District West Sand Lake Fire District West Sand

Fire Safety PPT-SM-FIRESFTY 1 V.A.0.0 Fire Theory Definition of fire Rapid, persistent

2019 CALIFORNIA FIRE CODE LOCAL AMENDMENTS Cambria CSD Fire Department Introduction Every 3

Wall Fire District No. 3 Learn the reality of fire what a fire is really like Learn how

HOBOKEN FIRE DEPARTMENT HOBOKEN FIRE DEPARTMENT TABLE OF ORGANIZATION Uniformed Fire Division 1

Financial Results Presentation 21 October 2015 Enduring. Evolving. Growing. ARA-CWT Trust

Source: Painting from Iain Mauley. 2010. Tales of Old Singapore . Singapore: Earnshaw Books, p.11.

ZALORA PRODUCTION SERVICE (ZPS) Agenda Breakdown of ZALORA Production Service APPAREL

Motivating Gaussian Example : Dataset Iris Model Grapical bn O 7- a he Model Generative

N-Gram Language Models Jimmy Lin Jimmy Lin The iSchool University of Maryland Wednesday,

SIMULATION AND VISUALIZATION OF DUCTILE FRACTURE WITH THE MATERIAL POINT METHOD (MPM) Particle

20 January 2017 1 Purpose Where Every Child Matters, Every Staff Matters Parents to know

GRAPHICAL NOTATION SCHEMES Cai Wingfield go.bath.ac.uk/cai c.a.j.wingfield@bath.ac.uk Young