Lexical Normalization for Neural Network Parsing Rob van der Goot, Gertjan van Noord University of Groningen r.van.der.goot@rug.nl 26-01-2018 1 / 31
Last Year (CLIN27) kheb da gzien ik heb dat gezien orig gzien kheb da tokenize lookup gezien ik heb daar gzn hb wa +N-grams classifier w2v gezien dat kga gezien heb dea aspell geziene khee doa 2 / 31
This Year Use normalization to adapt neural network dependency parsers Evaluate the effect of normalization versus externally trained word embeddings and character level models See if we can exploit top-n candidates New treebank to evaluate domain adaptation 3 / 31
New Treebank Why? Manually corrected train data Gold normalization available Data should be non-canonical UD format 4 / 31
New Treebank Pre-filtered to contain non-standard words Data from Li and Liu (2015): Owoputi and LexNorm 600 Tweets / 10,000 words UD2.1 format 5 / 31
New Treebank root parataxis punct xcomp advmod obl nsubj advmod advmod case nmod:tmod I feel so bad .. Not so sure about wrk tomarra 6 / 31
New Treebank Experimental setup: Train: English Web Treebank Dev: Owoputi Test: Lexnorm 7 / 31
Neural Network parser Configuration: s 2 s 1 s 0 b 0 b 1 b 2 b 3 lazy dog ROOT the jumped over the fox brown Scoring: ( Score LeftArc , Score RightArc , Score Shift ) MLP V jumped V lazy V dog V the V brown V fox V over V the V ROOT concat concat concat concat concat concat concat concat concat s 8 s 7 s 6 s 5 s 4 s 3 s 2 s 1 s 0 LSTM b LSTM b LSTM b LSTM b LSTM b LSTM b LSTM b LSTM b LSTM b LSTM f LSTM f LSTM f LSTM f LSTM f LSTM f LSTM f LSTM f LSTM f x the x brown x fox x jumped x over x the x lazy x dog x ROOT Taken from Kiperwasser and Goldberg (2016) 8 / 31
Neural Network parser UUparser (de Lhoneux et al., 2017) Performs well Relatively easy to adapt No POS tags Characters + external embeddings 9 / 31
Neural Network parser LSTM f LSTM f LSTM f LSTM b LSTM b LSTM b � � � � � � � � � t 1 c 1 e 1 t 2 c 2 e 2 t 3 c 3 e 3 word1 word2 word3 10 / 31
Use Normalization as Pre-processing root amod compound compound new pix comming tomorroe 11 / 31
Use Normalization as Pre-processing root amod obj obl new pix coming tomorrow new pix comming tomorroe 12 / 31
Use Normalization as Pre-processing 64 62 60 58 56 54 52 50 raw norm. 48 e r t t a x x s h e e a b c + + + r a h c + 13 / 31
Use Normalization as Pre-processing 64 62 60 58 56 54 52 50 raw norm. 48 e r t t a x x s h e e a b c + + + r a h c + 14 / 31
Use Normalization as Pre-processing 64 62 60 58 56 54 52 50 raw norm. 48 e r t t a x x s h e e a b c + + + r a h c + 15 / 31
Use Normalization as Pre-processing 64 62 60 58 56 54 52 50 raw norm. 48 e r t t a x x s h e e a b c + + + r a h c + 16 / 31
Use Normalization as Pre-processing 64 62 60 58 56 54 52 raw 50 norm. gold 48 e r t t a x x s h e e a b c + + + r a h c + 17 / 31
Integrate Normalization new pix comming tomorroe 18 / 31
Integrate Normalization new pix comming tomoroe new 0.9466 pix 0.7944 coming 0.5684 tomorrow 0.5451 news 0.0315 selfies 0.0882 comming 0.4314 tomoroe 0.3946 knew 0.0111 pictures 0.0559 combing 0.0002 tomorrow’s 0.0191 now 0.0063 photos 0.0449 comping < 0.0001 Tagore 0.0174 newt 0.0045 pic 0.0165 common < 0.0001 tomorrows 0.0173 19 / 31
Integrate Normalization LSTM f LSTM f LSTM f LSTM b LSTM b LSTM b � � � � � � � � � t 1 c 1 e 1 t 2 c 2 e 2 t 3 c 3 e 3 word1 word2 word3 20 / 31
Integrate Normalization n � w i = � P ij ∗ � n ij j =0 21 / 31
Integrate Normalization news ∗ 0 . 0315) + ( � w 1 = ( � new ∗ 0 . 9466) + ( � knew ∗ 0 . 0111) + ( � � now ∗ 0 . 0063) + ( � newt ∗ 0 . 0045) 22 / 31
Integrate Normalization 64 62 60 58 56 54 52 raw norm. 50 integr. gold 48 e r t t a x x s h e e a b c + + + r a h c + 23 / 31
Integrate Normalization But what about in-domain performance? 24 / 31
Integrate Normalization 90 88 86 84 82 base +norm 80 25 / 31
Integrate Normalization 0.10 +8.42e1 0.08 0.06 0.04 0.02 base +norm 0.00 26 / 31
Integrate Normalization Test data: Model UAS LAS raw 70.47 60.16 normalization- direct 71.03* 61.83* integrated 71.15 62.30* gold 71.45 63.16* Table: *indicates statistical significance compared to previous entry. 27 / 31
Integrate Normalization Conclusions: Normalization is still helpful on top of character and external embeddings Integrating normalization leads to a small but consistent/significant improvement Performance +-60% from using gold normalization New dataset will be made available, provides a nice benchmark for domain adaptation 28 / 31
Next CLIN Effect of different categories of normalization replacements Get closer to gold normalization 29 / 31
Bibliography Miryam de Lhoneux, Yan Shao, Ali Basirat, Eliyahu Kiperwasser, Sara Stymne, Yoav Goldberg, and Joakim Nivre. From raw text to universal dependencies - look, no tags! In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , pages 207–217, Vancouver, Canada, August 2017. Association for Computational Linguistics. Eliyahu Kiperwasser and Yoav Goldberg. Simple and accurate dependency parsing using bidirectional LSTM feature representations. TACL , 4:313–327, 2016. Chen Li and Yang Liu. Joint POS tagging and text normalization for informal text. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015 , pages 1263–1269, 2015. 30 / 31
Integrate Normalization Foster: not noisy, constituency Denoised Web Treebank: no train Tweebank: no train Foreebank: not noisy 31 / 31
Recommend
More recommend