UE Nikon _ Nga Tran Anh Hang , Hiroko Kobayashi, Yu Sawai, Paulo - PowerPoint PPT Presentation

UE Nikon _ Nga Tran Anh Hang , Hiroko Kobayashi, Yu Sawai, Paulo Quaresma

Outline ● Introduction ○ Task Motivation ● Methodologies ○ Rule-based Method (UE-ja-2) ○ Feature-engineering (UE-ja-1, UE-ja-3, UE-en-1) ○ Distributed Representations (UE-en-2, UE-en-3) ● Results and Discussion ● Conclusion 2

Introduction NLP research is focusing on rather “clean” language data. In reality, there are many difficult cases to detect. ● 犬って鼻づまりとかするのかな？ (I wonder if dogs get things like stuffy noses?) ● うちのテレビ熱だしすぎで大丈夫かな、これほんと。 (My TV is giving off an awful lot of heat. Is it okay? Seriously.) Table 1. Counts of symptom labels in the training data (1920 pseudo-tweets) 3 1930ja 1930en 犬って鼻づまりとかするのかな？ I wonder if dogs get things like stuffy noses? 1955ja 1955en 犬が鼻水垂らしている写真が大好きだ I love photos of a dog with a runny nose. 1975ja 1975en 最近携帯が熱持っちゃう。そろそろ買い替えの My cell phone is hot lately. Time to exchange it for a new one. 時期だ。 2029en 2029ja Do shrimp get the flu? インフルエンザって海老もなるの？ 2107en 2107ja My TV is giving off an awful lot of heat. Is it okay? Seriously. うちのテレビ熱だしすぎで大丈夫かな、これほん 2156en と。 I wonder if dogs get colds too 2156ja 2215en 犬も鼻風邪ってひくのかな The picture from my friend is a photo of a dog making a snot bubble, lol! I guess dogs get 2215ja stuffy noses too! 友達の着信の待ち受けが、犬が鼻ちょうちん 2225en 作ってる写真でふいた！犬も鼻づまりとかなるん I didn't know dogs get runny noses. だね！ 2231en 2225ja I was sent a photo of a dog with a runny nose. 犬も鼻水たらすんだね。 2261en 2231ja If a bee had allergies, it wouldn't make a living 犬が鼻水垂らしてるしゃしん送られてきた。 2504en 2261ja The dog's runny nose is so cute. Before I knew it I took a picture. 蜂が花粉症だったら商売にならないね 2559en 2504ja Our dog sounds strange lately, I wonder if he has a cold. 犬が鼻水垂らしてるのが可愛くて思わず写真撮ってしまった。 2559ja 最近のうちの犬の鳴き声が変なんだけど、鼻風邪ひいたのかな。

Task Motivation ● We want to know strength and weakness of popular methods on “real-world datasets” . 1. Rule based What we 2. Feature engineering guessed... 3. Distributed representations 3 Dataset- size 2 required 1 Robustness 4

Methodology: Rule-based Approach (UE-ja-2) dic dic tweet ● Pre-processing Pre. rule1 rule2 rule3 labels filtering filtering detection Extract nouns (Mecab, NEologd) ○ ● Filtering Use NEGATIVE (not symptoms) dictionary ○ (e.g.” 鳥インフルエンザ (bird flu)”) Use rule (except future phrase “ 明日 (tomorrow)” ) ○ ● Detection of symptoms Use symptoms dictionary ○ influenza インフル、インフルエンザ Diarrhea 下痢・・・・ 5 Cold 風邪、鼻風邪

Methodology: Feature-engineering Approach (UE-ja-1, UE-ja-3, UE-en-1) tweet Pre. F.E. Post. labels 1. Pre-processing to reduce sparseness and noise 3. Random Forests Normalization of ● characters, nouns For En., replace pronouns ● with special tokens. 4. Post-processing 2. Feature Extraction surface features for robustness, Co-occurrence rules ● semantic features for long-distance relations e.g. Influenza + Fever Surface 1 to 2-grams ● Combined with ● Named-entity (for Ja.) ● rule-based model SRL based features ● (subj. verb. pairs, for Ja.) 6

Methodology: Distributed-representations Approach (UE-en-2, UE-en-3) Context Classification tweet SGLM labels Word by Similarity Vectors Skip-gram Language Model (w/wo sub-sampling) Similarity-based Classification Trained using both ● Symptom-clusters are pre-built ● dry-run and other tweet using dry-run data resources Used cosine similarity ● Fixed-length Context Vectors Built from Word-vectors 7

Results of Japanese Subtask 4th/19 8

Results of English Subtask 4th /12 9

Results and Discussion: Error Analysis ● More knowledge is needed, such as ontology Non-human case ：「犬って鼻づまりとかするのかな？」 ○ ( I wonder if dogs get things like stuffy noses?) ● Discourse level knowledge is needed (Jp corpus) ○ 「インフルかと思って病院に行ったけど、検査したら違ったよ。」 (I thought I had the flu so I went to the doctor, but I got tested and I was wrong.) ● Other things to be mentioned Dealing with dialects: 「あかん」 ○ New-born expressions (newborn words/phrases on the Internet) ○ 10

Conclusions ● Simple methods can achieve good performance! ○ We focused on practical application ○ Applied Rule-based, Feature-engineering based, Distributed-representation based systems ● There are still many things to be improved ○ Handle explicit knowledge of symptoms. ○ Discourse, and causal structure ○ Neologisms, slang, dialects (for Japanese corpus) Thank you! ○ Jokes, time and space detection 11

Appendix 12

Error Statistics (Ja. subtask) 13

Error Statistics (En. subtask) 14

Details of Pre-processing & Custom Dictionary (UE-Ja-1&3) ● Preprocessing Applied normalization used in ○ https://github.com/neologd/mecab-ipadic-neologd/wiki/Regexp ● Custom dictionary Contains nouns which are not chunked properly by ○ MeCab-IPADic-NEologd Also used for normalizing by dictionary-form （原形） entries: ○ e.g. {* 鼻ずまり , 鼻づまり , 鼻詰まり -> 鼻づまり } A word or phrase with *asterisk is marked as spelling or grammatical error. Some metaphorical usages found in dry-run data are also normalized: ○ e.g. { 頭痛の種 , 頭痛のもと -> 面倒事 } 15

Methodology: Distributed-representations Approach ● Sub-sampling of frequent words SOURCE TEXT TRAINING SAMPLE (I, have) I have a headache, so I’ve decided to go home. (I, a) (have, I) I have a headache so I’ve decided to go home. (have, a) (have, headache) (a, I) I have a headache so I’ve decided to go home. (a, have) (a, headache) (a, so) I have headache so I’ve (so, I) decided to go home. a (so, have) (so, headache) (so. I’ve) 16

UE Nikon _ Nga Tran Anh Hang , Hiroko Kobayashi, Yu Sawai, Paulo - PowerPoint PPT Presentation

UE Nikon _ Nga Tran Anh Hang , Hiroko Kobayashi, Yu Sawai, Paulo Quaresma Outline Introduction Task Motivation Methodologies Rule-based Method (UE-ja-2) Feature-engineering (UE-ja-1, UE-ja-3, UE-en-1) Distributed

Nikon Multimedia Event Detection System Takeshi Matsuo and Shinich Nakajima Optical Research

Nikon KeyMission 360 Course Evaluation : Log into aefis.wisc.edu using your netid (or click on

CT devices for Dimensional Metrology Filip Geuens CTO - Nikon Metrology Symposium on Computed

FP7 1 1-12-2014 Sensors DSLR camera (Nikon D4, 16.2 MP, 60 mm focal length) IP

Extension of 193 Immersion Lithography Steve Renwick Senior Principal Engineer, NPI Overview

Imitation Theory and Experimental Evidence Joerg Oechssler University of Heidelberg

A Magazines Articles everywhere Lets talk about setting up your Camera. CAMERA Settings

Measuring Test Mass Scattering Christian Pluchar Advisor: Keita Kawabe Outline -What Ive

Printer GROUP 7 The Team Jack Ruskell William Meldrum- Chris Magnus Thush Amir Mettawa Duc

Embedded System Design Stephen A. Edwards Columbia University Spring 2011 Spot the Computer

This presentation is only about basic porro design small Mikrons. Jena, Germany, 7- 8 + 9

3D Documentation Using Entry Level 360 Degree Cameras 3 Easy Steps Take Photos Upload to Cloud

Pool-based Agnostic Pool-based Agnostic Experiment Design Experiment Design in Linear