Introduction Method Evaluation and Results Summary Identification of Transliterated Foreign Words in Hebrew Script Yoav Goldberg Michael Elhadad CiCLing 2008, Haifa, Israel Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary A Typical Hebrew Text Taken from YNET Gossip Section a Few Days Ago Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary A Typical Hebrew Text Taken from YNET Gossip Section a Few Days Ago Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary A Typical Hebrew Text Taken from YNET Gossip Section a Few Days Ago KAST VVLLNTYYN’Z DYY ST DABL SKS Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary A Typical Hebrew Text Taken from YNET Gossip Section a Few Days Ago Foreign words written in Hebrew script Can’t expect comprehensive dictionary coverage Would like to identify them automatically Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary Which words are we looking for? Names of people if they are not Israeli / Hebrew / Russian / Amharic / Arabic of places in case they are pronounced the same in English of Companies/Organization if they sound non-Hebrew (mostly easy to decide) of Months if they sound the same in English Example YES: ��ו ‘ �ג /John, �טרבור /Robert, �קיי ‘ �ג /Jake NO: �באוי /Yoav, �גרבדלוג /Goldberg, �דדַחלא /Elhadad Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary Which words are we looking for? Names of people if they are not Israeli / Hebrew / Russian / Amharic / Arabic of places in case they are pronounced the same in English of Companies/Organization if they sound non-Hebrew (mostly easy to decide) of Months if they sound the same in English Example YES: ��רק /Karen NO: ��רק /Keren Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary Which words are we looking for? Names of people if they are not Israeli / Hebrew / Russian / Amharic / Arabic of places in case they are pronounced the same in English of Companies/Organization if they sound non-Hebrew (mostly easy to decide) of Months if they sound the same in English Example YES: �לקיימ /Michael (pronounced maykel) NO: �לאכימ /Michael (pronounced mi-cha-el) Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary Which words are we looking for? Names of people if they are not Israeli / Hebrew / Russian / Amharic / Arabic of places in case they are pronounced the same in English of Companies/Organization if they sound non-Hebrew (mostly easy to decide) of Months if they sound the same in English Example YES: �דווילוה /Hollywood, �קרויוינ /New-York NO: �הילגנא /Anglia (England), �הילטיא /Italya (Italy) Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary Which words are we looking for? Names of people if they are not Israeli / Hebrew / Russian / Amharic / Arabic of places in case they are pronounced the same in English of Companies/Organization if they sound non-Hebrew (mostly easy to decide) of Months if they sound the same in English Example YES: �טפוסורקיימ /Microsoft, �קיינ /Nike NO: ��סוא /Osem, �הבונת /Tnuva Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary Which words are we looking for? Names of people if they are not Israeli / Hebrew / Russian / Amharic / Arabic of places in case they are pronounced the same in English of Companies/Organization if they sound non-Hebrew (mostly easy to decide) of Months if they sound the same in English Example YES: �טסוגוא /August, �רבמטפס /September NO: �ינוי /Yuni (June), �ילוי /Yuli (July) Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary Which words are we looking for? Cognates, transliterations and borrowings It revolves around the pronunciation Some words are clearly Foreign Foreign origin, but Hebrew-sounding – NO Non inflected, and pronounced the same – YES Inflected, pronounced the same – YES Inflected, pronounced differently – MAYBE Can be read as Hebrew or Foreign – DEPENDS on context Example �טסאק /Cast, �לבאד /Double, �ייד /Day, �דנרט /Trend Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary Which words are we looking for? Cognates, transliterations and borrowings It revolves around the pronunciation Some words are clearly Foreign Foreign origin, but Hebrew-sounding – NO Non inflected, and pronounced the same – YES Inflected, pronounced the same – YES Inflected, pronounced differently – MAYBE Can be read as Hebrew or Foreign – DEPENDS on context Example �הידפולקיצנא /En-ci-klo-pe-di-ya Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary Which words are we looking for? Cognates, transliterations and borrowings It revolves around the pronunciation Some words are clearly Foreign Foreign origin, but Hebrew-sounding – NO Non inflected, and pronounced the same – YES Inflected, pronounced the same – YES Inflected, pronounced differently – MAYBE Can be read as Hebrew or Foreign – DEPENDS on context Example �וידר /Radio, �סקס /Sex Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary Which words are we looking for? Cognates, transliterations and borrowings It revolves around the pronunciation Some words are clearly Foreign Foreign origin, but Hebrew-sounding – NO Non inflected, and pronounced the same – YES Inflected, pronounced the same – YES Inflected, pronounced differently – MAYBE Can be read as Hebrew or Foreign – DEPENDS on context Example �י�דנרט /Trendy Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary Which words are we looking for? Cognates, transliterations and borrowings It revolves around the pronunciation Some words are clearly Foreign Foreign origin, but Hebrew-sounding – NO Non inflected, and pronounced the same – YES Inflected, pronounced the same – YES Inflected, pronounced differently – MAYBE Can be read as Hebrew or Foreign – DEPENDS on context Example �ילוהוכלא /Alcoholi (vs. Alcoholic) Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary Which words are we looking for? Cognates, transliterations and borrowings It revolves around the pronunciation Some words are clearly Foreign Foreign origin, but Hebrew-sounding – NO Non inflected, and pronounced the same – YES Inflected, pronounced the same – YES Inflected, pronounced differently – MAYBE Can be read as Hebrew or Foreign – DEPENDS on context Example �דב /Bad vs. cloth , branch , ��ר /Run vs. sang ,Proper Name Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary The approach We chose to tackle the problem as Performing Language Identification at the Word Level Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary The approach We chose to tackle the problem as Performing Language Identification at the Word Level Language Identification accuracy: > 99 % A solved problem? Yoav Goldberg, Michael Elhadad Foreign Word Identification
Introduction Method What we are trying to solve Evaluation and Results How we are trying to solve it Summary The approach We chose to tackle the problem as Performing Language Identification at the Word Level Language Identification accuracy: > 99 % . . . but requires about 50 characters Yoav Goldberg, Michael Elhadad Foreign Word Identification
Recommend
More recommend