Time series data mining Outline Basic Knowledge Multi variate - PowerPoint PPT Presentation

Time series data mining

Outline Basic Knowledge Multi variate association States association

What is time series data Formally, a time series data is defined as a sequence of pairs T = [(p1, t1), (p2, t2), . . . , (pi, ti), . . . , (pn, tn)] (t1 < t2 < · · · < ti < · · · < tn), where each pi is a data point in a d-dimensional data space, and each ti is the time stamp at which the corresponding pi occurs. 10 20 30 40 50 60 0

Time Series Data Characteristics 1.high dimensionality 2.hierarchical nature A time series can be analyzed by its underlying time hierarchy, such as hourly, weekly, monthly, and yearly. 3.multi-variate Time series data analysis often studies one variable, but sometimes deals with time series data consisting of multiple related variables. For example, weather data consists of well-known measurements such as temperature, dew point, humidity, etc

Our work Multi variate association: A and B are highly correlated States association : A = 2 → B = 3

Multi variate association Extract feature ↓ Cluster the feature ↓ Analyze the clustering result

Why to extract feature Time series are essentially high dimensional data and working directly with such data in its raw format is very expensive in terms of both processing and storage cost. It is thus highly desirable to develop representation techniques that can reduce the dimensionality of time series, while still preserve the fundamental characteristics of it.

How to extract feature Principles: reduce dimension while preserve its fundamental characteristics Split the data into fixed size window ↓ Extract feature of each window [relative time,standard deviation]

Clustering The objective is to find the most homogeneous clusters that are as distinct as possible from other clusters. More formally, the grouping should maximize intercluster variance while minimize intracluster variance.

取第一条数据单独成簇， Clustering 簇中心为自己 Y 是否处理完所有取第一条数据数据 N 取下一条数据和所有簇中心比较均小于相至少有一似度阈值个大于相似度阈值和所有簇中心比较该数据单独成为将该数据放进和它一个新簇相似度最高的簇中均小于相至少有一个大于似度阈值相似度阈值是否处理完所有数据将该数据放进和它相似该数据单独成为一度最高的簇中，并且更个新簇 N 换簇中心 Y 取下一条数据更换簇中心 Y 簇中心是否变化 N 从所有簇中选出最相近的两个是否大于给定阈值 Y N 退出合并这两个簇

The analysis of clustering result Noise . . . .. . . . . . . ... . . . . . . . n . . . . . . . . . . . . . . . . . . . L . ∑ . . . . . . . . . . . i . . . . . . . . . . . . . . . . . . . . sup port i 1 . . . . . = . = . . . . . . . . . . k n . . . . . . . . . L . . . ∑∑ . . . . . ij . . . . . . . . . . . . . . . . . . . . . . . . . . . j 1 i 1 . . = = . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Experiments result

States Association Our goal Not only to find that variable A,B,C,D are related but also discover their value association. A = 2 ,D = 4 → B = 3,C = 7

States Association • Transform the data into symbol • Apriori algorithm

Data preprocessing Split the data into fixed length window ↓ Extract the feature of each window ↓ Cluster the window ↓ Symbolize each cluster

Feature extraction DFT others 1 DWT 6 2 feature 5 extraction 3 APCA 4 A A P Multi views frequency time statistics … domain domain

Clustering and Symbolization Cluster the features extracted from each window and then each cluster is represented by a character. So the windows within a cluster is marked with the corresponding character.

data →

data time A1 A2 A3 A4 A5 A6 t1 a d a b c a t2 1 a a a d a t3 2 b b b c b

Association mining The Apriori Algorithm is an influential algorithm for mining frequent itemsets and association rules. Association rule generation is usually split up into two separate steps: 1. First, minimum support is applied to find all frequent itemsets in a database. 2. Second, these frequent itemsets and the minimum confidence constraint are used to form rules. While the second step is straight forward, the first step needs more attention.

Association mining

Experiment result A1 = {0,1,2} sin(t ) A 0 + δ = ⎧ 1 ⎪ A 1 A 1 = = ⎨ 4 1 ⎪− sin(t ) A 2 + δ = ⎩ 1 1000 (t t )/ T A 0 × − = ⎧ 0 1 ⎪ A 1 A 1 = = ⎨ 3 1 ⎪− 1000 (t t )/ T A 2 × − = ⎩ 0 1

Experiment result A2 = {6,7,8,9} 2000 (t t )/ T A 6 × − = ⎧ 0 2 ⎪ 1500 (t t )/ T A 7 × − = ⎪ 0 2 A = ⎨ 5 1 A 8 = ⎪ 2 ⎪− 1000 (t t )/ T A 9 × − = ⎩ 0 2 sin(t ) A 6 + δ = ⎧ 2 ⎪ 1 A 7 = ⎪ 2 A = ⎨− 6 sin(t ) A 8 + δ = ⎪ 2 ⎪ cos(t ) A 9 + δ = ⎩ 2

Experiment result

Thanks

Time series data mining Outline Basic Knowledge Multi variate - PowerPoint PPT Presentation

Time series data mining Outline Basic Knowledge Multi variate association States association What is time series data Formally, a time series data is defined as a sequence of pairs T = [(p1, t1), (p2, t2), . . . , (pi, ti), . . . , (pn, tn)]

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

CS6220: DATA MINING TECHNIQUES Mining Time Series Data Instructor: Yizhou Sun yzsun@ccs.neu.edu

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Time Series Representations for Better Data Mining What can we do with time series data?

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Why do you care? Time-series data is all over the place. Time-Series Data Kaitlin Duck

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline

design S. Ahmed, L. Capozza, A. Dbeyssi, P. Grasemann, J. Jorge Rico, F. Maas, O. Noll, D.

SkyScope: An Aviation SkyScope: An Aviation VANCOUVER/VANCOUVER INTL/BC VANCOUVER/VANCOUVER

LBNE L ong B aseline N eutrino E xperiment Sam Zeller LANL NDM09, Madison September 4, 2009

Underground Storage and Recovery Jerri Pohl Statewide Projects Jerri.pohl@state.nm.us WRRI

CS 4518 Mobile and Ubiquitous Computing Lecture 8: Sensors, Step Counting & Activity

Full NLO corrections to 3-jet production and R 32 at the LHC Max Reyer University of Freiburg

Forecasting Solar Energetic Particle Events Using Changes in Electron Flux Sierra Ashley,

Diads and their Application to Topoi Toby Kenney Mathematics, Dalhousie University, Halifax,

Time series data mining Outline Basic Knowledge Multi variate - PowerPoint PPT Presentation

Time series data mining Outline Basic Knowledge Multi variate association States association What is time series data Formally, a time series data is defined as a sequence of pairs T = [(p1, t1), (p2, t2), . . . , (pi, ti), . . . , (pn, tn)]

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

CS6220: DATA MINING TECHNIQUES Mining Time Series Data Instructor: Yizhou Sun yzsun@ccs.neu.edu

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Time Series Representations for Better Data Mining What can we do with time series data?

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Why do you care? Time-series data is all over the place. Time-Series Data Kaitlin Duck

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline

design S. Ahmed, L. Capozza, A. Dbeyssi, P. Grasemann, J. Jorge Rico, F. Maas, O. Noll, D.

SkyScope: An Aviation SkyScope: An Aviation VANCOUVER/VANCOUVER INTL/BC VANCOUVER/VANCOUVER

LBNE L ong B aseline N eutrino E xperiment Sam Zeller LANL NDM09, Madison September 4, 2009

Underground Storage and Recovery Jerri Pohl Statewide Projects Jerri.pohl@state.nm.us WRRI

CS 4518 Mobile and Ubiquitous Computing Lecture 8: Sensors, Step Counting &amp; Activity

Full NLO corrections to 3-jet production and R 32 at the LHC Max Reyer University of Freiburg

Forecasting Solar Energetic Particle Events Using Changes in Electron Flux Sierra Ashley,

Diads and their Application to Topoi Toby Kenney Mathematics, Dalhousie University, Halifax,

CS 4518 Mobile and Ubiquitous Computing Lecture 8: Sensors, Step Counting & Activity