classifying news stories to estimate the direction of a
play

Classifying News Stories to Estimate the Direction of a Stock Market - PowerPoint PPT Presentation

Classifying News Stories to Estimate the Direction of a Stock Market Index (Brett Drury, Luis Torgo and J. J. Almeida)[1] Hao Fu, Jiatong Ruan Introduction Background Timely information from news -> Prediction of the prospects of economic


  1. Classifying News Stories to Estimate the Direction of a Stock Market Index (Brett Drury, Luis Torgo and J. J. Almeida)[1] Hao Fu, Jiatong Ruan

  2. Introduction

  3. Background Timely information from news -> Prediction of the prospects of economic actors ● News Information: the past or the future VS. Numeric data: the past ● Some published methods exist: ● manually created rules ○ models learnt from manully selected data and manually constrcted dictionaries ○ Disadvantage: Rely on human annotator ●

  4. Related Work Manually organize news stories Alignment of news sotries to market movement [6] 19 categories with different ● levels. [2] limited in single companies and ● Using machine readable news ● where the company names are on to automatically classify headlines.[6] stories. [3] Increase to 39 categories. [4] ● Dictionary contains 423 ● features. [5]

  5. News Story Classification Alignment of stories with sharp Manual constrcuted rules with Self-training to construct a automatically constructed market movement model to classify news stories dictionaries Fig 1: Proposed Classification

  6. Data Amount: News stories (>300,000) News Source: Really SImple Syndication (RSS) feeds Time Period: Oct. 2008 - Jun. 2010 , crawler ran at the same time each day Database: RDBMS : headline, description, published data and story text Stock Data: Yahoo Finance

  7. Data Data pre-process: Remove duplicate stories and non-finance stories ● Remove sentences that did not contain the named entities: companies, organizations, market ● indexes and company employees. The sentence set was parsed with the ANNIE Part of Speech Tagger[8]. ●

  8. Model from Rule Selected Data[7] Economic Actor (company, organization, market, etc.) Classified as positive or negative event or Verb/Adj . sentiment phrases Unclassified Object (profits, unemployment, etc) Fig 2: Rule Classifyer Model

  9. Alighment of Market Data Assumption: If the market moves sharply then this movement will be reflected in the published ● news stories. This strategy selected data by labelling news stories by their co-occurrence with a single market ● movement . A positive day is assumed to be when the market move by more than 1.7% and a negative day ● when the market lose more than 2.11% .

  10. Hybrid of Rules and Alignment This strategy attempts to mitigate the flaws of a rule classifier and alignment with a simple voting ● strategy . equal ● labels! rule classifier news training story set Alignment Fig 3: Hybrid Strategy for equal labels

  11. Hybrid of Rules and Alignment contradictory labels! rule classifier news training story set Alignment Fig 4: Hybrid Strategy for contradictory labels The strategy ensured that stories which were contrary to market trend were not included in the ● training set.

  12. Proposed Algorithm Fig 5: Flow Diagram for Proposed Algorithm

  13. Evaluation The evaluation methodology is based on estimated F-Measure. ● The F-Measure is estimated for models genearated from: headline , description and story text ● information. Strategy Headline Text Description Rules 0.77 0.60 0.65 Alignment 0.57 0.57 0.57 Hybrid 0.66 0.57 0.58 Proposed 0.84 0.71 0.77 Fig 6: Estimated F-Measure for competing strategies

  14. Conclusion This paper presents a proposed method for categorizing news stories into positive or negative ● categories. By combining a rule classifier and alignment with market movement the chance of identifying ● events which may influence the market is increased. The proposed method adds further documents with a self-training method. The proposed method has a clear advantage over the competing methods by F - Measure . ●

  15. Contribution Designed a hybrid strategy that can mitigate the flaws of rule classifier and alignment of market ● data. Proposed a new algorithm by introducing self-training to utilize unlabelled training data for ● training more robust model.

  16. Limitations How models are induced from headline, description and story text, which is really important for us ● to evaluate, is not clearly presented in paper. Market movement depends on many factors, some of which might be contradictory, it’s probably ● not a good idea to ignore data that contrary to market trend.

  17. Future Work Evaluate techniques with news published when the market is closed . ● Assign a relevance measure to news story. ● Utilize news volume. ●

  18. Q & A

  19. Reference [1] Drury, Brett, Luis Torgo, and J. J. Almeida. "Classifying news stories to estimate the direction of a stock market index." Information Systems and Technologies (CISTI), 2011 6th Iberian Conference on. IEEE, 2011. [2] Taleb, Nassim Nicholas and Lane, Allen., The Black Swan (The impact of the highly improbable). Random House, 2008. [3] Thomas, James D. News and Trading Rules. s.1. : CiteSeer, 2003 [4] Mittermayer, M A and Knolmaye, G F. Text Mining Systems for Market Response to News: A Survey. University of Bern, 2006

  20. Reference [5] Wuthrich, B, et al., Daily prediction of major stock indices from textual www data. International conference on Knowledge Discovery and Data Mining, 1998 [6] Lavrenko, Victor, et al., Language Models for Financial News Recommendation. ACM Press, 2000 [7] Drury, Brett and Almeida, J J., Identification of Fine Grained Feature Based Event and Sentiment Phrases from Business News Stories. ACM, 2011 [8] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. Gate: A framwork and graphical development environment for robust nlp tools and applications. In Proceeding of the 40th Anniversary Meeting of the Association for Computational LInguistics, 2002

Recommend


More recommend