incorporating knowledge into dnn for financial numeral
play

Incorporating Knowledge into DNN for Financial Numeral - PowerPoint PPT Presentation

ASNLU at NTCIR-14 Finnum Task: Incorporating Knowledge into DNN for Financial Numeral Classification ChaoChun Liang Institute of Information Science Academia Sinica, Taipei June 12, 2019 0 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019


  1. ASNLU at NTCIR-14 Finnum Task: Incorporating Knowledge into DNN for Financial Numeral Classification ChaoChun Liang Institute of Information Science Academia Sinica, Taipei June 12, 2019 0 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  2. Outlin line • Propos oposed ed A Appr proa oaches hes • Exper perim imen ental R al Result ults • Discu scussi ssion • Conc nclu lusion ion 1 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  3. Task O Ta Overview ew • Purpos pose: e: To unde derstand t and the f fine ne-gr grai ained ned numer eral al infor ormat atio ion i n in financ nancia ial T l Tweet et ”8” is a numeral about quantity ”17.99” is about stop loss price “200” is a indicator of technical indicator (T1) 8 breakouts: $CHMT (stop: $ 17.99 ), $FLO ( 200 -day MA), $OMX (gap), $SIRO (gap). One sub-$ 1 stock. Modest selection on attempted swing low. 2 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  4. Propo posed A App pproac ach 1/5 • Model del the Numer eral al C Clas assific icat ation ion a as a a Sequen quence L e Labeli beling P ng Process o Input Word Sequence: W1, W2, … Wn o Output Label Sequence: T1, T2, … Tn M : main category class set, S : sub-category class set O : Not a target word to be classified 3 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  5. Propo posed A App pproac ach 2/5 • Propos opose a e a token en r repr pres esent ntat ation ion w with h exter ernal k al knowle owledge dge to o rep represent t the w he wor ord meanin aning i g in Tweet eet s sent ntenc ences es • Imple lement ent t three v ee vanill nilla n a neur ural n al networ work models dels 4 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  6. Propo posed A App pproac ach 3/5 • Token en R Repr present entat atio ion o W: Pre-trained Word Embedding o P: Part-of-Speech, N: Named entity Type o C: Category-Pattern Feature (#=6) • Company. (‘$NTNX’) • Money. (‘$20 20’ or ’13 13$’) • Product number. (‘PS4’) • Date. (’11 11/09 09/17 17’ or ’11 11-09 09-17 17’) • Time. (‘6:45 45’ or ‘3:25 25 p.m.’) • Number. (’68 68’) 5 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  7. Propo posed A App pproac ach 4/5 • CNN CNN (det (detect l loc ocal pat patterns, e. e.g. g. ’85 85%’) • RNN RNN (capt ptur ure c cont ntex ext i infor ormat atio ion) n) • RNN+ N+CN CNN (capt ptur ure l local al i info. o. i in RNN) N) 6 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  8. Propo posed A App pproac ach 5/5 • Rescor orin ing g in P Predic ediction ion T Time: e: o Exclude the Out-of-Category (‘O’) label from the candidate set for each target numeral to avoid inconsistency. 7 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  9. Expe perime ment nt S Setting • Pr Pre-trai ained ned E Embedd bedding ing o GLOVE 840.300D • CNN CNN o Kernel sizes of 2,3,4 and 5 o 32 filters for each kernel • RNN RNN o Bi-GRUs with 128 hidden nodes • Dropou opout 0 0.5 8 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  10. Over erall P Performa mance CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 81.83 69.54 84.22 73.36 82.71 69.63 +POS&NE 88.21 79.14 88.45 78.63 89.72 80.93 +POS&NE 87.73 78.47 88.76 83.55 89.24 81.50 +Pattern Task-1 Test Set Performance CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 69.88 58.66 75.22 71.72 73.94 65.54 +POS&NE 75.14 65.77 78.49 72.37 78.17 70.16 +POS&NE 76.41 68.5 79.36 70.5 79.12 72.51 +Pattern Task-2 Test Set Performance “ None ” denotes the NN models without incorporating any knowledge. “ POS&NE ” denotes the NN models with both POS and NE information. 9 “ Pattern ” denotes the NN models that incorporate category patterns specified by handcrafted rules. ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  11. Expe perime ment ntal Res Results 1/3 CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 81.83 69.54 84.22 73.36 82.71 69.63 Task-1 testing set performance • Divis isio ion o n of clas assif ific ication r ion result ults betwee ween n CNN a N and d RNN m N models dels 10 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  12. Expe perime ment ntal Res Results 2/3 CNN CNN RNN RNN RNN+CNN RNN+CNN Micro Micro Macro Macro Micro Micro Macro Macro Micro Micro Macro Macro None None 81.83 81.83 69.54 69.54 84.22 84.22 73.36 73.36 82.71 82.71 69.63 69.63 +POS&NE 88.21 79.14 88.45 78.63 89.72 80.93 Task-1 testing set performance • OOVs p prov ovide ide no u useful ul Infor ormation ion o OOVs: 30+% on Development and Test sets • Lingu nguis istic ic I Infor ormat ation ion (POS&NE NE) a attac ache hed d to o OOVs i impr mproved t the per he performance signif gnific icant ntly ly ( (4% ~ ~ 10%) %). 11 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  13. Expe perime ment ntal Res Results 3/3 CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 81.83 69.54 84.22 73.36 82.71 69.63 +POS&NE 88.21 79.14 88.45 78.63 89.72 80.93 +POS&NE 87.73 78.47 88.76 83.55 89.24 81.50 +Pattern Task-1 testing set performance • Categor egory-pat atter ern f n featur ures es of offer s smal mall impr prov ovem ement nts or even d en degr grad ade e perf rforma rmance ce. • Not ot c cov over eno enough pat patterns for man or manually- encoded oded r rules les. 12 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  14. Discussion Di 1/2 • Issu ssue-1: 1: High OOV OOV rate te • Issu ssue-2: 2: Dive verse rse p pattern rns in T n Twee eet ( ( Not ot enough cover enough overage w age with handc h handcraf afted pat ed patter erns) ns) • Solut lution ion: Nume mera ral-Spli plittin ing o Most OOVs are concatenations of a numeral and other characters. o Split each token with numbers into individual sub-tokens. o e.g., “FY22” -> ”FY” and “22” OOV Rate Dev Test o e.g., “12/3/2017” -> “12”, “/”, “3”, ”/”, “2017” Before 36% 39% After 22% 23% 13 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  15. Di Discussion 2/2 • Per erforman ance i ce impr mprove ves s s signi gnificant cantly. y. E.g. g., 9% 9% (mi micr cro), 18%( 18%(ma macr cro) i in n RNN+CNN(“None one”). • Out utper performs t the he handc handcraf afted pat ed patter erns. s. CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 81.83 69.54 84.22 73.36 82.71 69.63 +POS&NE 87.73 78.47 88.76 83.55 89.24 81.50 +Pattern Task-1 Test Set Performance (before Numeral Splitting) CNN RNN RNN+CNN Micro Macro Micro Macro Micro Macro None 89.56 83.17 92.27 86.60 92.11 88.18 +POS&NE 90.68 83.60 91.95 88.36 92.99 88.25 Task-1 Test Set Performance (after Numeral Splitting) 14 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  16. Conc Co nclusion on • The p propos oposed d token r en repr pres esen entat ation ( ion (wit ith h lingu nguis istic ic k knowled owledge) ge) impro rove ves s perfor ormanc ance s signi gnific icant antly ly. • A suitab able p le pre-pro roce cessi ssing ( (split plitting ing nume mera rals) t to red o reduce OOV rat rates is essent ential ial. • Joint intly ly a adopt opting ing both a h appr proac oaches es c could uld offer er a addit dition ional b al benef nefit its. 15 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  17. Q & A Thanks 16 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  18. App ppen endix – P10 10 1/ 1/2 • Erro rrors mad made by by RNN wer ere due due t to o the he model m del missing l ng local p al patter erns o E.g., “num/num” (Temporal) in “10/24” “num%” (Percentage) in “7.8%” • Erro rrors mad made by by CNN wer ere due due t to o the he model m del missing c ng cont ntex ext i infor ormat ation ion o E.g., “ You sol old ESPR at 11 11 and CLVS at 29 29 but thanks for this tip. ” 17 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  19. App ppen endix – P10 10 2/ 2/2 • Erro rrors mad made by by RNN and and C CNN bot both w wer ere due t due to o the he num number can an not not be be cat ategorized explic plicitly ly (i.e. e. n need m ed more e infor ormat atio ion) n). o E.g., “ $NGAS Buy on dips on $UGAZ $UNG. Dip to 3.075, NG is on wave 3 move to 3.27 on 8HR chart. ” 18 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

  20. App ppen endix – P12 12 • Categor egory F F-Scor ore o e of RNN N with P h POS&NE NE and C d Categ egor ory-Pat atter erns ns +POS&NE +POS&NE +Pattern Monetary 0.9107 0.9085 Quantity 0.7727 0.7857 Percentage 0.9882 0.9882 Temporal 0.8978 0.8903 Product Number 0.3182 0.6818 Option 0.7727 0.7727 Indicator 0.7778 0.7037 19 ASNLU at the NTCIR-14 FinNum Task, June 12, 2019

Recommend


More recommend