Fast Algorithms for Coevolving Time Series Mining Lei Li Computer - PowerPoint PPT Presentation

Fast Algorithms for Coevolving Time Series Mining Lei Li Computer Science Department Carnegie Mellon University Advisor: Christos Faloutsos ICDE 2010 PHD workshop 3/21/2010

Thanks • Organizers: – Nikos Mamoulis – Yannis Papakonstantinou – Timos Sellis • Travel fellowship from NSF – NSF grant IIS-0956600 2

Coevolving Time Series (TS) Temperature in datacenter Chlorine level in water Need fast algorithms for time series mining BGP updates in network Marker positions in mocap 3

Outline • Motivation – Mining tasks, goals, and problems • Completed Work – P1:Mining w/ Missing Value [Li+ 2009] – P2:Parallel Learning [Li+ 2008b] – P3:Natural Motion Stitching [Li+ 2008a] • Conclusion 4

M1: Natural Motion Generation • How to generate new realistic motions from mocap database? • e.g. “karate kick”  “boxing” • Applications: – Game ($57billion 2009) – Movie animation – Quality of Life (assistive devices) 5

M2: Data Summarization • How to compress & manage large time series? – A datacenter with 5000 servers: 1TB data per day, 55 million streams ([Reeves+ 2009]) • Goal: save energy in data center – $4.5billion power for US dc’s 2006 temperatures CMU DCO Time 7

M3: Anomaly Detection • How to detect anomalies? • Applications: – Intrusion computer network traffic (e.g. # of packets) – Detect leakage or attack in drinking water system by monitoring chlorine levels – Spam/robot in web clicks 8

Time Series Mining Tasks • Pattern Discovery (e.g. cross-correlation, lag- correlation) – T1:Forecasting – T2:Summarization – T3:Segmentation (detecting change points) – T4:Anomaly detection • Feature Extraction (e.g. wavelets coefficients) – T5:Clustering – T6:Indexing TS database – T7:Visualization 10

Goals for Mining Algorithms • G1:Effective: – achieve low reconstruction error (mean square error) (DynaMMo, [Li+2009]) – high precision/recall, classification accuracy • G2:Scalable: – to the size (e.g. length) of sequences – on modern hardware (Cut-And-Stitch [Li+2008b]) 11

Outline • Motivation • Completed Work – P1: DynaMMo: Mining w/ Missing Value[Li+09] • Problem Definition recovering • Intuition of Proposed Method compression segmentation • Results – P2: Cut-And-Stitch: Parallel Learning [Li+08b] – P3: Natural Motion Stitching [Li+08a] • Conclusion 12

Missing Values in Time Series • Motion Capture: – Markers on human actors – Cameras used to track the 3D positions – Duration: 100-500 – 93 dimensional body-local coordinates after preprocessing (31-bones) • Sensor data missing due to: – Low battery – RF error From mocap.cs.cmu.edu joint work w/ C. Faloutsos, J. McCann, N. Pollard. 13 [Li et al, KDD 2009]

Problem Definition [Li+2009] • Given sensor 1 sensor 2 … sensor m blackout Time • Find algorithms for: – Recovering missing values – Compression/summarization (T2) – Segmentation (T3) 14

Problem Definition (cont’) sensor 1 sensor 2 … sensor m blackout Time • Want the algorithms to be: – G1: Effective – G2: Scalable: to duration of sequences 15

Proposed Method: Intuition Position of Left hand Recover using marker Correlation among multiple sequences Position of right hand marker missing 16

Proposed Method: DynaMMo Intuition Position of Recover using Left hand Dynamics marker temporal moving pattern Position of right hand marker missing 17 more results in [Li et al 2009]

(details) Underlying Model Use Linear Dynamical Systems to model whole sequence. N (z 0 , Γ ) N (F∙z 1 , Λ ) N (F∙z 2 , Λ ) N (F∙z 3 , Λ ) N (F∙z 4 , Λ ) Z 1 Z 2 Z 3 Z 4 … N (G∙z 1 , Σ ) N (G∙z 2 , Σ ) N (G∙z 3 , Σ ) N (G∙z 4 , Σ ) X 4 X 1 X 2 X 3 partially observed observed z 1 = z 0 + ω 0 Model parameters: θ={ z 0 , Γ , F, Λ , G, Σ } z n+1 = F∙z n + ω n x n = G∙z n + ε n 18

Results – Better Missing Value Recovery Reconstruction Spline MSVD error [Srebro’03] Linear Interpolation Proposed DynaMMo Ideal Average missing length Dataset: CMU Mocap #16 mocap.cs.cmu.edu 42 more results in [Li et al 2009]

Results – Better Compression error DynaMMo w/ optimal compression Ideal Compression ratio Dataset: Chlorine levels 43 more results in [Li et al 2009]

Results: segment synthetic data • Segment by threshold on reconstruction error original data reconstruction error 44

Results – Segmentation • Find the transition during “running” to “stop”. left hip left femur reconstruction error 45

Results – Segmentation • Find the transition during “running” to “stop”. left hip slow run stop down left femur reconstruction error 46

Outline • Motivation • Completed Work – P1: DynaMMo: Mining w/ Missing Value [Li+09] • Contribution : the most accurate mining algorithms for TS with missing value so far. – P2: Cut-And-Stitch: Parallel Learning [Li+08b] – P3:Natural Motion Stitching [Li+08a] • Conclusion 47

Outline • Motivation • Completed Work – P1: DynaMMo: Mining w/ Missing Value[Li 09] – P2: Cut-And-Stitch: Parallel Learning [Li 08b] • Problem Definition • Basic Intuition Goals for Mining Algorithms • Results • G1:Effective: – achieve low reconstruction error (mean square error) (DynaMMo, [Li 2009]) – high precision/recall, classification accuracy • G2:Scalable: – to the size (e.g. length) of sequences – on modern hardware (Cut-And-Stitch [Li 2008b]) 48

(details) Recap Model for DynaMMo Use Linear Dynamical Systems to model whole sequence. N (z 0 , Γ ) N (F∙z 1 , Λ ) N (F∙z 2 , Λ ) N (F∙z 3 , Λ ) N (F∙z 4 , Λ ) Z 1 Z 2 Z 3 Z 4 … N (G∙z 1 , Σ ) N (G∙z 2 , Σ ) N (G∙z 3 , Σ ) N (G∙z 4 , Σ ) X 4 X 1 X 2 X 3 partially observed observed z 1 = z 0 + ω 0 Model parameters: θ={ z 0 , Γ , F, Λ , G, Σ } z n+1 = F∙z n + ω n x n = G∙z n + ε n 49

Challenge of Learning LDS: Expectation-Maximization Alg. • Not easy to parallelize on multi-processors due to non-trivial data dependency (details in writeup) • Q: How to parallelize the learning to achieve scalability? N (z 0 , Γ ) N (F∙z 2 , Λ ) N (F∙z 3 , Λ ) N (F∙z 4 , Λ ) N (F∙z 1 , Λ ) Z 1 Z 2 Z 3 Z 4 … N (G∙z 1 , Σ ) N (G∙z 2 , Σ ) N (G∙z 3 , Σ ) N (G∙z 4 , Σ ) X 4 51 X 1 X 2 X 3

Challenge illustration Expectation-Maximization Alg. Timeline for E-step (forward-backward) in learning LDS 1 2 3 4 5 EM can only uses Step 1 Single CPU Step 2 Due to data Step 3 dependency Step 4 Step 5 Step 6 Step 7 Step 8 60

Problem Definition • Problem: – Given a sequence of numbers, design a parallel learning algorithm to find the best model parameters for Linear Dynamical Systems • Goal: – Achieve ~ linear speed up on multi-core • Assumption: – Shared memory architecture (e.g. multi-core) 61

Proposed Method: Cut-And-Stitch Intuition: 1 2 3 4 5 Goal: with 2 CPUs Step 1 Step 2 Step 3 Step 4 Details in [Li et al 2008b]: Joint work w/ Wenjie Fu, Fan Guo, Todd 62 C. Mowry, Christos Faloutsos.

Near Linear Speedup speedup Proposed Cut-And-Stitch ideal Dataset: 58 motion sequences CMU Mocap #16 mocap.cs.cmu.edu, tested on NCSA super computer, EM algorithm # of processors 70 more results in [Li et al 2008b]

No loss of accuracy 2.5% 2.0% EM alg Normalized Cut-And-Stitch 1.5% Reconstruction Error 1.0% 0.5% 0.0% (#16.22) (#16.01) (#16.45) ~ IDENTICAL 71 more results in [Li et al 2008b]

Outline • Motivation • Completed Work – P1:DynaMMo: Mining w/ Missing Value [Li+09] – P2:Cut-And-Stitch:Parallel Learning [Li+08b] • Contribution : the 1 st parallel algorithm for learning LDS Goals for Mining Algorithms • G1:Effective: – achieve low reconstruction error (mean square error) (DynaMMo, [Li 2009]) – high precision/recall, classification accuracy • G2:Scalable: – to the size (e.g. length) of sequences – on modern hardware (Cut-And-Stitch [Li 2008b]) 72

Outline • Motivation • Completed Work – P1:DynaMMo: Mining w/ Missing Value [Li+09] – P2:Cut-And-Stitch:Parallel Learning [Li+08b] – P3:Natural Motion Stitching [Li+08a] • Problem Definition • Proposed Method • Results • Conclusion 73

Motion Stitching A Database Approach • Select best stitchable segments from a set of basic motion pieces and generate new natural motions 74

Problem Definition • Given two motion-capture sequences that are to be stitched together, how can we assess the goodness of the stitching? 1 2 Which stitching looks best? 3 75 Joint work w/ Jim McCann, Nancy Pollard, Christos Faloutsos [Li et al, Eurographics2008]

Competitor: Euclidean distance fail straight moving U-Turn Equally “good” under Euclidean distance 76

Result – Synthetic Transition straight moving U-Turn Laziness-score prefer straightforward moving 78 more results in [Li 2008a]

Conclusion • Pattern discovery w/ missing values (DynaMMo) – Recovering missing values – Compression – Segmentation • Scale up learning on multicore – Parallel learning algorithm for LDS (Cut-And- Stitch) • Natural human motion stitching – An intuitive distance function(Laziness score) 79

Fast Algorithms for Coevolving Time Series Mining Lei Li Computer - PowerPoint PPT Presentation

Fast Algorithms for Coevolving Time Series Mining Lei Li Computer Science Department Carnegie Mellon University Advisor: Christos Faloutsos ICDE 2010 PHD workshop 3/21/2010 Thanks Organizers: Nikos Mamoulis Yannis

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

CoEvolving Memetic Algorithms (COMA) A framework for algorithm creation and adaptation Jim

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

CS6220: DATA MINING TECHNIQUES Mining Time Series Data Instructor: Yizhou Sun yzsun@ccs.neu.edu

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Automated Iterative Partitioning for Cooperatively Coevolving Particle Swarms in Large Scale

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

standard series Overview DP series DX series H series M series bitte hier

Mount Morgan Gold & Copper Project Austmine Austmine Austmine Smart Mining Series

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Cybersecurity for Future Presidents Lecture 14: Cyberwarfare and Other Topics for Future

Race Why is parallelism hard? Non-determinism!! Practice Theory 2 Why is parallelism

Threat Modeling in Cyber-Physical Systems May 16, 2017 By Emeka Eyisi Ph.D. Mark Moulin Ph.D.

A Truthful Incentive Mechanism for Emergency Demand Response in Colocation Data Centers Linquan

Beyond NP Revolution Kuldeep S. Meel National University of Singapore @Telekom ParisTech May

Effect of passive safety systems on typical beyond-design accidents for WWER-1000/V-392 reactor

Session Outline n PIEs in IAEA SS o Regulatory perspective o Design perspective o Safety

TCIP: Trustworthy Cyber Infrastructure for Power William H. Sanders Information Trust Institute

Fast Algorithms for Coevolving Time Series Mining Lei Li Computer - PowerPoint PPT Presentation

Fast Algorithms for Coevolving Time Series Mining Lei Li Computer Science Department Carnegie Mellon University Advisor: Christos Faloutsos ICDE 2010 PHD workshop 3/21/2010 Thanks Organizers: Nikos Mamoulis Yannis

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

CoEvolving Memetic Algorithms (COMA) A framework for algorithm creation and adaptation Jim

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

CS6220: DATA MINING TECHNIQUES Mining Time Series Data Instructor: Yizhou Sun yzsun@ccs.neu.edu

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Automated Iterative Partitioning for Cooperatively Coevolving Particle Swarms in Large Scale

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

standard series Overview DP series DX series H series M series bitte hier

Mount Morgan Gold &amp; Copper Project Austmine Austmine Austmine Smart Mining Series

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Cybersecurity for Future Presidents Lecture 14: Cyberwarfare and Other Topics for Future

Race Why is parallelism hard? Non-determinism!! Practice Theory 2 Why is parallelism

Threat Modeling in Cyber-Physical Systems May 16, 2017 By Emeka Eyisi Ph.D. Mark Moulin Ph.D.

A Truthful Incentive Mechanism for Emergency Demand Response in Colocation Data Centers Linquan

Beyond NP Revolution Kuldeep S. Meel National University of Singapore @Telekom ParisTech May

Effect of passive safety systems on typical beyond-design accidents for WWER-1000/V-392 reactor

Session Outline n PIEs in IAEA SS o Regulatory perspective o Design perspective o Safety

TCIP: Trustworthy Cyber Infrastructure for Power William H. Sanders Information Trust Institute

Mount Morgan Gold & Copper Project Austmine Austmine Austmine Smart Mining Series