Techniques to improve Dictionary Based CLIR Sai Madhurya Peyyeti - PowerPoint PPT Presentation

Techniques to improve Dictionary Based CLIR Sai Madhurya Peyyeti KX48810

Different Techniques in IR Translation is the key problem in CLIR Query Translation u Dictionary Based Ø Corpora Based Dis-Adv: lack of resources Ø Machine Translation Based Ø Dis-Adv: MT systems are expensive to develop and their application degrades the retrieval efficiency Document Translation u Dis-Adv: more computational effort and scaling issues in case of more than two languages Dual Translation (Both Query and Document) u

Query Translation Pros: Computational effort i.e. time and space, is less as compared with other methods. Cons: i) Usually a query does not provide enough contexts to automatically find the intended meaning of each term in the query. (ii) Translation errors affect retrieval performance sensibly. (iii) In case of searching a multilingual database, query has to be translated into each one of the languages of database.

Comparison

Dictionary Based Translation Only keywords are translated using Machine Readable Dictionaries (MRD). Translating the query using the dictionaries is much faster and simpler than translating the documents Cons: 1)Untranslatable words (like new compound words, proper names, spelling variants, and special terms) 2) Processing of inflected words: Inflected word forms are usually not found in dictionaries the plural -s ; the third-person singular -s ; the past tense -d, -ed , or -t ; Smart → Smartest , Him → Himself 3) Lexical ambiguity in source and target languages: homonymous and polysemous words. E.g.- She will park the car so we can walk in the park . Park- action of moving vehicle to a place - usually a car park Park- a public area close to nature Due to ambiguity in the search keys, retrieving relevant documents may not be successful

Improve Dictionary Based Translation Dictionary-based approach suffers from: phrase translation, ambiguity, coverage and processing of u inflected and untranslatable words In 2001, Jianfeng Gao et al. used dictionary-based approach for English-Chinese CLIR and u suggested following techniques to improve it. First, noun phrases were recognized and translated as whole by using statistical model and phrase u translation pattern. Second, the word having highest degree of cohesion was selected as best translation among the u set of translation words. This research work found a significant improvement over the simple dictionary-based approach.

Query Expansion(QE) in CLIR Common factors responsible for poor relevancy of CLIR : lack of availability of resources in u target language, short query, wrong translation and incorrect representation of query QE is the process of increasing the quality of retrieved results by expanding the original query u using additional words. Q.E. can be performed in three different ways: manual, interactive and automatic. Ballesteros and Bruce Croft in 1997 performed an experimental analysis for English–Spanish CLIR u by using query expansion in three different ways (i) pre query translation expansion (ii) post query translation expansion (iii) both pre-post query translation expansion 3 rd one is effective and improves precision of retrieved results

QE Ctd.. Ranking of documents using Okapi BM25 helps in indexing better terms for Q.E. Q.E involves techniques such as: Finding synonyms of words, and also searching for them Ø Finding semantically related words (e.g. antonyms, meronyms, hyponyms, hypernyms) Ø Finding all the morphological forms of words by stemming each word in the search query Ø Fixing spelling errors and automatically searching for the corrected form or suggesting it in the Ø results Re-weighting the terms in the original query Ø Search engines invoke query expansion to increase the quality of user search results.

Local Feedback Automatic word-by-word (WBW) translation of queries via MRD results in a 60% loss in u effectiveness Relevance feedback : query is modified by the addition of terms found in documents known to u be relevant to the query. Local feedback differs from classic relevance feedback in that it assumes the top retrieved documents are relevant. Pre-translation feedback expansion creates a stronger base for translation and improves u precision. Local feedback after MRD translation introduces terms which de-emphasize irrelevant u translations to reduce ambiguity and improve recall. Combining pre- and post-translation feedback is most effective and reduces translation error by u up to 36%.

Local Context Analysis (LCA) Local feedback can seriously degrade retrieval performance if only few of the top-ranked u documents retrieved for the original query are relevant Although local context analysis is a local technique, it employs cooccurrence analysis, a primary u tool for global techniques, for query expansion and has been shown to be more effective than simple local feedback LCA ranks the concepts according to their cooccurrence within the top-ranked documents with u the query terms and uses the top-ranked concepts for query expansion. Combining pre and post translation LCA expansion is most effective and improves precision and u recall. Query expansion techniques like LCA and local Feedback along with improved phrasal translation u can significantly reduce the error associated with dictionary translation by 45%

References https://dl.acm.org/doi/10.1145/278459.258540 u https://www.sciencedirect.com/science/article/pii/S1319157817301295 u https://www.researchgate.net/publication/297752831_A_Survey_on_Cross_Language_Information_R u etrieval http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.1556&rep=rep1&type=pdf u https://dl.acm.org/doi/pdf/10.1145/333135.333138 u http://terpconnect.umd.edu/~oard/pdf/delos97.pdf u http://terpconnect.umd.edu/~oard/research.html#treclegal u http://terpconnect.umd.edu/~oard/pdf/forum01.pdf u

Techniques to improve Dictionary Based CLIR Sai Madhurya Peyyeti - PowerPoint PPT Presentation

Techniques to improve Dictionary Based CLIR Sai Madhurya Peyyeti KX48810 Different Techniques in IR Translation is the key problem in CLIR Query Translation u Dictionary Based Corpora Based Dis-Adv: lack of resources Machine

Dictionary and Monolingual Corpus-based Query Translation for Basque-English CLIR Xabier Saralegi

The Dictionary ADT The dictionary ADT models a searchable collection findElement(k): if the

End-to-End Neural CLIR by Sharing Representation LILY Spring 2018 Workshop Rui Zhang

CMSC 206 Dictionaries and Hashing The Dictionary ADT n a dictionary (table) is an abstract

6. Dictionary models for text compression Previous techniques: Predictive, statistical One

DISC- Improv to Improve DISC- Improv to Improve DISC- Improv to Improve DISC- Improv to Improve

A Survey on Cross-language IR (CLIR) Naveen Yamparala (RS09174) Types of IR (Language based)

for CLIR CLEF09: Ad-hoc (TEL) Session, Corfu, Greece Institute AIFB University of Karlsruhe

Dictionaries A Good morning dictionary English: Good morning Spanish: Buenas das

Sparse Coding and Dictionary Learning for Image Analysis Part II: Dictionary Learning for signal

Hashing - Introduction Dictionary Dictionary = a dynamic set that supports the = a dynamic set

Dictionary lookup Suppose youre looking up a word in the dictionary (paper one, not

The dictionary problem. A dictionary can be seen as a database of records; in each record we

Hash- Tables Introduction Dictionary Dictionary stores key-value pairs Find( k ) Insert( k

Agenda Announcements Dictionary please snarf code for class today

Dictionaries and Sets Ali Taheri Sharif University of Technology Spring 2019 Outline 1.

Evaluation of Rich and Explicit Feedback for Exploratory Search Esben Srig 1 , Nicolas Collignon

Welcome to the 4th Annual Honoree Athletic Banquet Athletic Support Team Transportation :

Overview of the sufficiency of measures analysis Co-funded by the European Union 18 June 2019

Council Deliberations November 25, 2019 2020 Proposed Business Plan and Budget 2020 Budget

QUERY AND DOCUMENT EXPANSION IN TEXT RETRIEVAL Clara Isabel Cabezas University of Maryland

Reinforcing patient relevance in evidence generation Feedback from breakout session 5C EMAs

Classification and Machine Learning techniques for CBIR: introduction to the RETIN system

Supporting Survivors of Relationship Violence with Serious Mental Illness Presenter: Annie

Sambuz

Useful Links

Newsletter

Mail Us