Mining F reqeun t Episo des for relating Financial Ev en ts and Sto c k T rends Ann y Ng and Ada W ai�c hee F u Departmen t of Computer Science and Engineering The Chinese Univ ersit y of Hong Kong� Shatin� Hong Kong Email� ang�adafu�cse�cuhk�edu�hk Abstract� It is exp ected that sto c k prices can b e a�ected b y the lo cal and o v erseas p olitical and economic ev en ts� W e extract ev en ts from the �nancial news of Chinese lo cal newspap ers whic h are a v ailable on the w eb� the news are matc hed against sto c k prices databases and a new metho d is prop osed for the mining of frequen t temp oral patterns� � In tro duction In sto c k mark et� the share prices can b e in�uenced b y man y factors� ranging from news releases of companies and lo cal p olitics to news of sup erp o w er econom y � W e call these incidences ev en ts � W e assume that eac h ev en t is of a certain ev en t t yp e and eac h ev en t has a time of o ccurrence� t ypically giv en b y the date that the ev en t o ccurs or it is rep orted� Eac h �ev en t� therefore corresp onds to a time p oin t� W e exp ect that ev en ts lik e �the Hong Kong go v ernmen t announcing de�cit� and �W ashington deciding to increase the in terest rate�� ma y lead to �uctuations in the Hong Kong sto c k prices within a short p erio d of time� When a n um b er of ev en ts o ccur within a short p erio d of time� w e assume that they p ossibly ha v e some relationship� Suc h a p erio d of time can b e determined b y the application exp erts and it is called a windo w � usually limited to a few da ys� Roughly sp eaking� a set of ev en ts that o ccur within a windo w is called an episo de instance � The set of ev en t t yp es in the instance is called an episo de � F or example� w e ma y ha v e the follo wing statemen t in a �nancial rep ort� �T elecomm unications sto c ks pushed the Hang Seng Index �� higher follo wing the Star TV�HK T elecom and Orange�Mannesmann deals�� This can b e an ex� ample for an episo de� in whic h all the four ev en ts� �telecomm unicatio n sto c ks rise�� �Hang Seng Index surges� and the t w o deals of �Star TV�HK T elecom� and �Orange�Mannesmann�� all happ ened within a p erio d of � da ys� If there are man y instances of the same episo de it is called a fr e quent episo de � W e are in terested to �nd frequen t episo des related to sto c k mo v emen ts� The sto c k mo v e� men t need not b e the last ev en t o ccurring in the episo de instance� b ecause the mo v emen t of sto c ks ma y b e caused b y the in v estors� exp ectation that something w ould happ en on the follo wing da ys� F or example� w e can ha v e a news rep ort sa ying �Hong Kong shares slid y esterda y in a mark et burdened b y the fear of p os� sible United States in terest rates rises tomorro w�� Therefore w e do not assume an ordering of the ev en ts in an episo de� �
F rom the frequen t episo de� w e ma y disco v er the factors for the �uctuation of sto c k prices� W e are in terested in a sp ecial t yp e of episo des that w e call sto c k�episo des � it can b e written as � h e � � ��� � t da ys� i �� where the � e e e � � n � � ��� are ev en t t yp es and at least one of the ev en ts should b e the ev en t of e e � n sto c k �uctuation� An instance for this sto c k�episo de is an instance where the ev en ts of the ev en t t yp es � ��� app ear in a windo w of t da ys� Since w e are e e � n only concerned with sto c k�episo des� w e shall simply refer to sto c k�episo des as episo des� ��� De�nition s Let E � f E � E � ���� E g b e a set of ev en t t yp es � Assume that w e ha v e a � � m database that records ev en ts for da ys � to n � W e call this a ev en t database � w e can represen t this as D B � � D � D � ���� D � � where D is for da y i � and � � n i D � f e � e � ���� e g � where e � E � j � �� � k ��� This means that the ev en ts i i � i � ik ij that happ en on da y ha v e ev en t t yp es � Eac h is called a da y� i e � e � ���� e D i � i � ik i record � The da y records in the database are consecutiv e and arranged in D i c hronological order� where is one da y b efore for all � � � � �� D D n i i i �� � f e g � where � � i � �� � b ��� is an episo de if P has at P � e � ���� e e E p � p � pb pi least t w o elemen ts and at least one is a sto c k ev en t t yp e� W e assume that e pj a windo w size is giv en whic h is da ys� this is used to indicate a consecutiv e x sequence of da ys� W e are in terested in ev en ts that o ccur within a short p erio d x as de�ned b y a windo w� If the database consists of da ys and the windo w size m is da ys� there are � m � windo ws in the database� The �rst windo w con tains x exactly da ys The i �th windo w con tains ��� � with up to D � D � ���� D D � D � x � � x i i �� da ys� The second last windo w con tains � and the last windo w con tains D � D m � � m only D � m In some previous w ork suc h as ���� the frequency of an episo de is de�ned as the n um b er of windo ws whic h con tain ev en ts in the episo de� F or our application� w e notice some problem with this de�nition� supp ose w e ha v e a windo w size of x � if an episo de o ccurs in a single da y i � then for windo ws that start from da y i � x � � to windo ws starting from i � they all con tain the episo de� so the frequency of the episo de will b e x � Ho w ev er� the episo de actually has o ccurred only once� Therefore w e prop ose a di�eren t de�nition for the frequency of an episo de� De�niti on �� Given a window size of x days for DB� and an episo de P � an episo de instance of P is an o c curr enc e of al l the event typ es in P within a window W and wher e the r e c or d of the �rst day of the window W c ontains at le ast one of the event typ es in P � Each window c an b e c ounte d at most onc e as an episo de instanc e for a given episo de� The frequency of an ev en t is the numb er of o c curr enc es of the event in the datab ase� The supp ort or the frequency of an episo de is the numb er of in� stanc es for the episo de� Ther efor e� the fr e quency of an episo de P is the numb er of windows W � such that W c ontains al l the event typ es in P and the �rst day of W c ontains at le ast one of the event typ es in P � A n episo de is a frequen t episo de if its fr e quency is � � a given minim um supp ort threshold � � �
Recommend
More recommend