brnir at the ntcir 14 finnum task scalable feature
play

BRNIR at the NTCIR-14 finnum task: Scalable feature extraction - PowerPoint PPT Presentation

BRNIR at the NTCIR-14 finnum task: Scalable feature extraction technique for numeral classification Alan Spark, Team Lead at AUTO1 GROUP 1 Agenda Motivation Features types Extraction pipeline Experiment design Results 2


  1. BRNIR at the NTCIR-14 finnum task: Scalable feature extraction technique for numeral classification Alan Spark, Team Lead at AUTO1 GROUP 1

  2. Agenda • Motivation • Features types • Extraction pipeline • Experiment design • Results 2

  3. Motivation • Focus on feature extraction in unsupervised fashion • Experiments on different features concatenations • Suggest a feature extraction pipeline 3

  4. Features types Topic distribution Tickers Tags a vector with topics multi-label encoding of multi-label encoding of distribution of a tweet tickers presented in a tags presented in a tweet tweet Number properties Token context Character context a vector encoding a "Bag-of-words" like encoding "Bag-of-words" like number properties such a of tokens neighboring a encoding of characters value, position & type and number. neighboring a number other 4

  5. Extraction pipeline 5

  6. Experiment design 6

  7. Results 7

  8. Summary and Future work • unsupervised approaches for feature extraction in application to FinNum task • methods are parallelizable and meant to be run at scale • utilize data discovered at preprocessing step • address natural imbalance • embedding for all “sparse” features • experiment with classification models 8

  9. 9 Thank you

  10. 10 Q&A

  11. AUTO1 Group GmbH c/o Alan Spark Bergmannstraße 72 10961 Berlin alan.spark@auto1.com mail@alanspark.net 11

  12. Additional plots 12

  13. Preprocessing highlight $ FNKO $ 10 is a no-brainer. Should trade back to IPO price $ 12. Remember, initial range on IPO was $ 16 on high end. Quiet period expiry soon. target num : [”10”, ”12.”] discovered numbers : [10, 12, 16] The approach detects extra numbers in more than 32% of tweets in given corpus 13

  14. Number of ”target numbers” per tweet on the left, Number of unique categories/subcategories per tweet on the right 14

Recommend


More recommend