Techniques to improve Dictionary Based CLIR Sai Madhurya Peyyeti KX48810
Different Techniques in IR Translation is the key problem in CLIR Query Translation u Dictionary Based Ø Corpora Based Dis-Adv: lack of resources Ø Machine Translation Based Ø Dis-Adv: MT systems are expensive to develop and their application degrades the retrieval efficiency Document Translation u Dis-Adv: more computational effort and scaling issues in case of more than two languages Dual Translation (Both Query and Document) u
Query Translation Pros: Computational effort i.e. time and space, is less as compared with other methods. Cons: i) Usually a query does not provide enough contexts to automatically find the intended meaning of each term in the query. (ii) Translation errors affect retrieval performance sensibly. (iii) In case of searching a multilingual database, query has to be translated into each one of the languages of database.
Comparison
Dictionary Based Translation Only keywords are translated using Machine Readable Dictionaries (MRD). Translating the query using the dictionaries is much faster and simpler than translating the documents Cons: 1)Untranslatable words (like new compound words, proper names, spelling variants, and special terms) 2) Processing of inflected words: Inflected word forms are usually not found in dictionaries the plural -s ; the third-person singular -s ; the past tense -d, -ed , or -t ; Smart → Smartest , Him → Himself 3) Lexical ambiguity in source and target languages: homonymous and polysemous words. E.g.- She will park the car so we can walk in the park . Park- action of moving vehicle to a place - usually a car park Park- a public area close to nature Due to ambiguity in the search keys, retrieving relevant documents may not be successful
Improve Dictionary Based Translation Dictionary-based approach suffers from: phrase translation, ambiguity, coverage and processing of u inflected and untranslatable words In 2001, Jianfeng Gao et al. used dictionary-based approach for English-Chinese CLIR and u suggested following techniques to improve it. First, noun phrases were recognized and translated as whole by using statistical model and phrase u translation pattern. Second, the word having highest degree of cohesion was selected as best translation among the u set of translation words. This research work found a significant improvement over the simple dictionary-based approach.
Query Expansion(QE) in CLIR Common factors responsible for poor relevancy of CLIR : lack of availability of resources in u target language, short query, wrong translation and incorrect representation of query QE is the process of increasing the quality of retrieved results by expanding the original query u using additional words. Q.E. can be performed in three different ways: manual, interactive and automatic. Ballesteros and Bruce Croft in 1997 performed an experimental analysis for English–Spanish CLIR u by using query expansion in three different ways (i) pre query translation expansion (ii) post query translation expansion (iii) both pre-post query translation expansion 3 rd one is effective and improves precision of retrieved results
QE Ctd.. Ranking of documents using Okapi BM25 helps in indexing better terms for Q.E. Q.E involves techniques such as: Finding synonyms of words, and also searching for them Ø Finding semantically related words (e.g. antonyms, meronyms, hyponyms, hypernyms) Ø Finding all the morphological forms of words by stemming each word in the search query Ø Fixing spelling errors and automatically searching for the corrected form or suggesting it in the Ø results Re-weighting the terms in the original query Ø Search engines invoke query expansion to increase the quality of user search results.
Local Feedback Automatic word-by-word (WBW) translation of queries via MRD results in a 60% loss in u effectiveness Relevance feedback : query is modified by the addition of terms found in documents known to u be relevant to the query. Local feedback differs from classic relevance feedback in that it assumes the top retrieved documents are relevant. Pre-translation feedback expansion creates a stronger base for translation and improves u precision. Local feedback after MRD translation introduces terms which de-emphasize irrelevant u translations to reduce ambiguity and improve recall. Combining pre- and post-translation feedback is most effective and reduces translation error by u up to 36%.
Local Context Analysis (LCA) Local feedback can seriously degrade retrieval performance if only few of the top-ranked u documents retrieved for the original query are relevant Although local context analysis is a local technique, it employs cooccurrence analysis, a primary u tool for global techniques, for query expansion and has been shown to be more effective than simple local feedback LCA ranks the concepts according to their cooccurrence within the top-ranked documents with u the query terms and uses the top-ranked concepts for query expansion. Combining pre and post translation LCA expansion is most effective and improves precision and u recall. Query expansion techniques like LCA and local Feedback along with improved phrasal translation u can significantly reduce the error associated with dictionary translation by 45%
References https://dl.acm.org/doi/10.1145/278459.258540 u https://www.sciencedirect.com/science/article/pii/S1319157817301295 u https://www.researchgate.net/publication/297752831_A_Survey_on_Cross_Language_Information_R u etrieval http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.1556&rep=rep1&type=pdf u https://dl.acm.org/doi/pdf/10.1145/333135.333138 u http://terpconnect.umd.edu/~oard/pdf/delos97.pdf u http://terpconnect.umd.edu/~oard/research.html#treclegal u http://terpconnect.umd.edu/~oard/pdf/forum01.pdf u
Recommend
More recommend