Time Series Mining and Forecasting Duen Horng (Polo) Chau Assistant - PowerPoint PPT Presentation

http://poloclub.gatech.edu/cse6242   CSE6242 / CX4242: Data & Visual Analytics   Time Series Mining and Forecasting Duen Horng (Polo) Chau   Assistant Professor   Associate Director, MS Analytics   Georgia Tech Partly based on materials by   Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Parishit Ram (GT PhD alum; SkyTree), Alex Gray

Outline • Motivation • Similarity search – distance functions • Linear Forecasting • Non-linear forecasting • Conclusions

Problem definition • Given : one or more sequences x 1 , x 2 , … , x t , … ( y 1 , y 2 , … , y t , …) (… ) • Find – similar sequences; forecasts – patterns; clusters; outliers

Motivation - Applications • Financial, sales, economic series • Medical – ECGs +; blood pressure etc monitoring – reactions to new drugs – elderly care

Motivation - Applications (cont’d) • ‘Smart house’ – sensors monitor temperature, humidity, air quality • video surveillance

Motivation - Applications (cont’d) • Weather, environment/anti-pollution – volcano monitoring – air/water pollutant monitoring

Motivation - Applications (cont’d) • Computer systems – ‘Active Disks’ (buffering, prefetching) – web servers (ditto) – network traffic monitoring – ...

Stream Data: Disk accesses #bytes time

Problem #1: Goal: given a signal (e.g.., #packets over time) Find: patterns, periodicities, and/or compress lynx caught per year (packets per day; count temperature per day) year

Problem#2: Forecast Given x t , x t-1 , …, forecast x t+1 90 Number of packets sent 80 70 60 ?? 50 40 30 20 10 0 1 3 5 7 9 11 Time Tick

Problem#2’: Similarity search E.g.., Find a 3-tick pattern, similar to the last one 90 Number of packets sent 80 70 60 ?? 50 40 30 20 10 0 1 3 5 7 9 11 Time Tick

Problem #3: • Given: A set of correlated time sequences • Forecast ‘Sent(t)’ 90 sent 68 Number of packets lost repeated 45 23 0 1 4 6 9 11 Time Tick

Important observations Patterns, rules, forecasting and similarity indexing are closely related: • To do forecasting, we need – to find patterns/rules – to find similar settings in the past • to find outliers, we need to have forecasts – (outlier = too far away from our forecast)

Outline • Motivation • Similarity search and distance functions – Euclidean – Time-warping • ...

Importance of distance functions Subtle, but absolutely necessary : • A ‘must’ for similarity indexing (-> forecasting) • A ‘must’ for clustering Two major families – Euclidean and Lp norms – Time warping and variations

Euclidean and Lp x(t) y(t) ! ! n 2 D ( x , y ) ( x y ) ∑ = − i i ... i 1 = ! ! n p L ( x , y ) | x y | ∑ = − p i i i 1 = L 1 : city-block = Manhattan L 2 = Euclidean L ∞

Observation #1 Time sequence -> n-d vector Day-n Day-2 ... Day-1

Observation #2 Day-n Euclidean distance is Day-2 closely related to ... – cosine similarity – dot product Day-1

Time Warping • allow accelerations - decelerations – (with or without penalty) • THEN compute the (Euclidean) distance (+ penalty) • related to the string-editing distance

Time Warping ‘stutters’:

Time warping Q: how to compute it? A: dynamic programming D( i, j ) = cost to match prefix of length i of first sequence x with prefix of length j of second sequence y

Time warping http://www.psb.ugent.be/cbd/papers/gentxwarper/DTWalgorithm.htm

Time warping Thus, with no penalty for stutter, for sequences x 1 , x 2 , …, x i,; y 1 , y 2 , …, y j no stutter D ( i 1 , j 1 ) − − $ ! x-stutter D ( i , j ) x [ i ] y [ j ] min D ( i , j 1 ) = − + − # ! D ( i 1 , j ) y-stutter − "

Time warping VERY SIMILAR to the string-editing distance no stutter D ( i 1 , j 1 ) − − $ ! x-stutter D ( i , j ) x [ i ] y [ j ] min D ( i , j 1 ) = − + − # ! D ( i 1 , j ) y-stutter − "

Time warping • Complexity: O(M*N) - quadratic on the length of the strings • Many variations (penalty for stutters; limit on the number/percentage of stutters; …) • popular in voice processing   [Rabiner + Juang]

Other Distance functions • piece-wise linear/flat approx.; compare pieces [Keogh+01] [Faloutsos+97] • ‘cepstrum’ (for voice [Rabiner+Juang]) – do DFT; take log of amplitude; do DFT again! • Allow for small gaps [Agrawal+95] See tutorial by [Gunopulos + Das, SIGMOD01]

Other Distance functions • In [Keogh+, KDD’04]: parameter-free, MDL based

Conclusions Prevailing distances: – Euclidean and – time-warping

Outline • Motivation • Similarity search and distance functions • Linear Forecasting • Non-linear forecasting • Conclusions

Linear Forecasting

Outline • Motivation • ... • Linear Forecasting – Auto-regression: Least Squares; RLS – Co-evolving time sequences – Examples – Conclusions

Problem#2: Forecast • Example: give x t-1 , x t-2 , …, forecast x t 90 Number of packets sent 80 70 60 ?? 50 40 30 20 10 0 1 3 5 7 9 11 Time Tick

Forecasting: Preprocessing MANUALLY: remove trends spot periodicities 7 days 3 6 2 5 2 3 1 2 0 0 1 3 5 7 9 11 13 1 2 3 4 5 6 7 8 9 10 time time

Problem#2: Forecast • Solution: try to express x t as a linear function of the past: x t-1 , x t-2 , …, (up to a window of w ) Formally: 90 80 70 60 ?? 50 40 30 20 10 0 1 3 5 7 9 11 Time Tick

(Problem: Back-cast; interpolate) • Solution - interpolate: try to express x t as a linear function of the past AND the future: x t+1 , x t+2 , … x t+wfuture; x t-1 , … x t-wpast (up to windows of w past , w future ) • EXACTLY the same algo’s ?? 90 80 70 60 50 40 30 20 10 0 1 3 5 7 9 11 Time Tick

Refresher: Linear Regression patient weight height 85 Body height 80 75 1 27 43 70 2 43 54 65 60 3 54 72 55 … 50 … … 45 40 25 N ?? 15 25 35 45 Body weight E xpress what we don’t know (= “dependent variable”) as a linear function of what we know (= “independent variable(s)”)

Linear Auto Regression Time Packets Packets Sent (t-1) Sent(t) 1 - 43 2 43 54 3 54 72 … … … 25 N ??

Linear Auto Regression Time Packets Packets ‘lag-plot’ Sent (t-1) Sent(t) 1 - 43 2 43 54 #packets sent 3 54 72 at time t … … … 25 N ?? #packets sent at time t-1 Lag w = 1 Dependent variable = # of packets sent (S [t]) Independent variable = # of packets sent (S[t-1])

More details: • Q1: Can it work with window w > 1? • A1: YES! x t x t-1 x t-2

More details: • Q1: Can it work with window w > 1? • A1: YES! (we’ll fit a hyper-plane, then!) x t x t-1 x t-2

Time Series Mining and Forecasting Duen Horng (Polo) Chau Assistant - PowerPoint PPT Presentation

http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Time Series Mining and Forecasting Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Partly based on

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

standard series Overview DP series DX series H series M series bitte hier

E- -Series: Series: Water Mist Extinguishers Water Mist Extinguishers E E- -Series: Series:

Fourier Series Fourier Sine Series Fourier Cosine Series Fourier Series Convergence

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Introduction to Time Series Basic Concepts Time series concepts well cover Elements of

Time Series Time Series Time Series Prof. Paolo Ciaccia Prof. Paolo Ciaccia http://www-

Why do you care? Time-series data is all over the place. Time-Series Data Kaitlin Duck

Section 1 Time Series Modeling 1 / 37 Time Series Modeling ST 810-006 Statistics and Financial

Working w ith more than one time series VISU AL IZIN G TIME SE R IE S DATA IN P YTH ON Thomas

Compare Time Series Growth Rates Manipulating Time Series Data in Python Comparing Stock

Introduction to Time Series Heino Bohn Nielsen 1 of 15 Outline (1) What is a time series? (2)

Time series Decomposing a series into meaningful components R.W. Oldford Time series data -

Kings Kings Kings Series Kings Series Series Series Lesson Lesson #107 #107 July July 18,

The FUNDAMENTAL Theorem of Calculus (yay!) 11/11/11 (also yay!) Warm-up Suppose a particle is

Reflect, Express, Compare: Reimagine Your Teacher Identity with the TeachPhil U Model and

CS 4411: Operating Systems Practicum Edward Tremel Spring 2020 Outline for Today Course

Outer Tracker Upgrade for the HL-LHC Maxwell Herrmann, for the CMS Collaboration August 2020

Creating publication quality figures Duen Horng (Polo) Chau Associate Professor Associate

Engineering Inclusion Kevin Stewart, VP Engineering Why Is This (Still) An Issue? Its Almost

Making the Case for Affordable Housing and Ending Homelessness September 25, 2018 2:00 3:30

You are here You are here Micro-45 Invited Speakers Charles Webb, IBM Monday Morning The Next