regression
play

Regression Albert Bifet May 2012 COMP423A/COMP523A Data Stream - PowerPoint PPT Presentation

Regression Albert Bifet May 2012 COMP423A/COMP523A Data Stream Mining Outline 1. Introduction 2. Stream Algorithmics 3. Concept drift 4. Evaluation 5. Classification 6. Ensemble Methods 7. Regression 8. Clustering 9. Frequent Pattern


  1. Regression Albert Bifet May 2012

  2. COMP423A/COMP523A Data Stream Mining Outline 1. Introduction 2. Stream Algorithmics 3. Concept drift 4. Evaluation 5. Classification 6. Ensemble Methods 7. Regression 8. Clustering 9. Frequent Pattern Mining 10. Distributed Streaming

  3. Data Streams Big Data & Real Time

  4. Regression Definition Given a numeric class attribute, a regression algorithm builds a model that predicts for every unlabelled instance I a numeric value with accuracy. y = f ( x ) Example Stock-Market price prediction Example Airplane delays

  5. Evaluation 1. Error estimation: Hold-out or Prequential 2. Evaluation performance measures: MSE or MAE 3. Statistical significance validation: Nemenyi test Evaluation Framework

  6. 2. Performance Measures Regression mean measures ◮ Mean square error: � ( f ( x i ) − y i ) 2 / N MSE = ◮ Root mean square error: √ �� ( f ( x i ) − y i ) 2 / N RMSE = MSE = Forgetting mechanism for estimating measures Sliding window of size w with the most recent observations

  7. 2. Performance Measures Regression relative measures ◮ Relative Square error: � ( f ( x i ) − y i ) 2 / � y i − y i ) 2 (¯ RSE = ◮ Root relative square error: √ �� � ( f ( x i ) − y i ) 2 / (¯ y i ) − y i ) 2 RRSE = RSE = Forgetting mechanism for estimating measures Sliding window of size w with the most recent observations

  8. 2. Performance Measures Regression absolute measures ◮ Mean absolute error: � MAE = ( | f ( x i ) − y i | ) / N ◮ Relative absolute error: � � RAE = ( | f ( x i ) − y i | ) / ( | ˆ y i − y i | ) Forgetting mechanism for estimating measures Sliding window of size w with the most recent observations

  9. Linear Methods for Regression Linear Least Squares fitting ◮ Linear Regression Model p � f ( x ) = β 0 + β j x j = X β j = 1 ◮ Minimize residual sum of squares N � ( y i − f ( x i )) 2 / N = ( y − X β ) ′ ( y − X β ) RSS ( β ) = i = 1 ◮ Solution: ˆ β = ( X ′ X ) − 1 X ′ y

  10. Perceptron w 1 Attribute 1 w 2 Attribute 2 w 3 w ( � Output h � x i ) Attribute 3 w 4 Attribute 4 w 5 Attribute 5 ◮ Data stream: � � x i , y i � ◮ Classical perceptron: h � w ( � x i ) = � w T � x i , w ) = 1 x i )) 2 ◮ Minimize Mean-square error: J ( � � ( y i − h � w ( � 2

  11. Perceptron w ) = 1 ◮ Minimize Mean-square error: J ( � w ( � x i )) 2 � ( y i − h � 2 ◮ Stochastic Gradient Descent: � w = � w − η ∇ J � x i ◮ Gradient of the error function: � w ( � ∇ J = − ( y i − h � x i )) i ◮ Weight update rule � w = � � w ( � x i )) � w + η ( y i − h � x i i

  12. Fast Incremental Model Tree with Drift Detection FIMT-DD FIMT-DD differences with HT: 1. Splitting Criterion 2. Numeric attribute handling using BINTREE 3. Linear model at the leaves 4. Concept Drift Handling: Page-Hinckley 5. Alternate Tree adaption strategy

  13. Splitting Criterion Standard Deviation Reduction Measure ◮ Classification Information Gain = Entropy(before Split) − Entropy(after split) c � Entropy = − p i · log p i c c � � p 2 Gini Index = p i ( 1 − p i ) = 1 − i ◮ Regression Gain = SD(before Split) − SD(after split) �� (¯ y − y i ) 2 / N StandardDeviation (SD) =

  14. Numeric Handling Methods Exhaustive Binary Tree (BINTREE – Gama et al, 2003) ◮ Closest implementation of a batch method ◮ Incrementally update a binary tree as data is observed ◮ Issues: high memory cost, high cost of split search, data order

  15. Page Hinckley Test ◮ The CUSUM test g 0 = 0 , g t = max ( 0 , g t − 1 + ǫ t − υ ) if g t > h then alarm and g t = 0 ◮ The Page Hinckley Test g 0 = 0 , g t = g t − 1 + ( ǫ t − υ ) G t = min ( g t ) if g t − G t > h then alarm and g t = 0

  16. Lazy Methods kNN Nearest Neighbours: 1. Mean value of the k nearest neighbours � k i = 1 f ( x i ) ˆ f ( x q ) = k 2. Depends on distance function

Recommend


More recommend