Chapter 7-2: Discrete S ete Sequentia ential D Data ta Jilles - PowerPoint PPT Presentation

Chapter 7-2: Discrete S ete Sequentia ential D Data ta Jilles Vreeken IRDM ‘15/16 26 Nov 2015

IRDM Chapter 7, overview  Time Series Basic Ideas 1. Prediction 2. Motif Discovery 3.  Discrete Sequences Basic Ideas 4. Pattern Discovery 5. Hidden Markov Models 6. You’ll find this covered in Aggarwal Ch. 3.4, 14, 15 VII-1: 2 IRDM ‘15/16

IRDM Chapter 7, today  Time Series Basic Ideas 1. Prediction 2. Motif Discovery 3.  Discrete Sequences Basic Ideas 4. Pattern Discovery 5. Hidden Markov Models 6. You’ll find this covered in Aggarwal Ch. 3.4, 14, 15 VII-1: 3 IRDM ‘15/16

Chapter 7.3, ctd: Motif Disc Discove very Aggarwal Ch. 14.4, 3.4 VII-1: 4 IRDM ‘15/16

Dynamic Time Warping DTW stretches the time axis of one series to enable better matches (Aggarwal Ch. 3.4) VII-1: 5 IRDM ‘15/16

DTW, formally Let 𝐸𝐸𝐸 ( 𝑗 , 𝑘 ) be the optimal distance between the first 𝑗 and first 𝑘 elements of time series 𝑌 of length 𝑜 and 𝑍 of length 𝑛 repeat 𝑦 𝑗 𝐸𝐸𝐸 ( 𝑗 , 𝑘 − 1) repeat 𝑧 𝑘 𝐸𝐸𝐸 𝑗 , 𝑘 = 𝑒𝑗𝑒𝑒𝑒𝑜𝑒𝑒 𝑌 𝑗 , 𝑍 𝑘 + min � 𝐸𝐸𝐸 ( 𝑗 − 1, 𝑘 ) repeat neither 𝐸𝐸𝐸 ( 𝑗 − 1, 𝑘 − 1) We initialise as follows  𝐸𝐸𝐸 0,0 = 0  𝐸𝐸𝐸 0, 𝑘 = ∞ for all 𝑘 ∈ {1, … , 𝑜 }  𝐸𝐸𝐸 𝑗 , 0 = ∞ for all 𝑗 ∈ {1, … , 𝑛 } We can then simply iterate by increasing 𝑗 and 𝑘 (Aggarwal Ch. 3.4) VII-1: 6 IRDM ‘15/16

Computing DTW (1) Let 𝐸𝐸𝐸 ( 𝑗 , 𝑘 ) be the optimal distance between the first 𝑗 and first 𝑘 elements of time series 𝑌 of length 𝑜 and 𝑍 of length 𝑛 repeat 𝑦 𝑗 𝐸𝐸𝐸 ( 𝑗 , 𝑘 − 1) repeat 𝑧 𝑘 𝐸𝐸𝐸 𝑗 , 𝑘 = 𝑒𝑗𝑒𝑒𝑒𝑜𝑒𝑒 𝑌 𝑗 , 𝑍 𝑘 + min � 𝐸𝐸𝐸 ( 𝑗 − 1, 𝑘 ) repeat neither 𝐸𝐸𝐸 ( 𝑗 − 1, 𝑘 − 1) From the initialised values, can simply iterate by increasing 𝑗 and 𝑘 : for for 𝑗 = 1 to 𝑛 for for 𝑘 = 1 to 𝑜 compute 𝐸𝐸𝐸 ( 𝑗 , 𝑘 ) We can also compute it recursively, by dynamic programming. Both naïve strategies cost 𝑃 𝑜𝑛 , however. (Aggarwal Ch. 3.4) VII-1: 7 IRDM ‘15/16

Computing DTW (2) Let 𝐸𝐸𝐸 ( 𝑗 , 𝑘 ) be the optimal distance between the first 𝑗 elements of time series 𝑌 of length 𝑜 and the first 𝑘 elements of time series 𝑍 of length 𝑛 repeat 𝑦 𝑗 𝐸𝐸𝐸 ( 𝑗 , 𝑘 − 1) repeat 𝑧 𝑘 𝐸𝐸𝐸 ( 𝑗 − 1, 𝑘 ) 𝐸𝐸𝐸 𝑗 , 𝑘 = 𝑒𝑗𝑒𝑒𝑒𝑜𝑒𝑒 𝑌 𝑗 , 𝑍 𝑘 + min � repeat neither 𝐸𝐸𝐸 ( 𝑗 − 1, 𝑘 − 1) We can speed up computation by imposing constraints.  e.g. a window constraint to compute 𝐸𝐸𝐸 ( 𝑗 , 𝑘 ) only when 𝑗 − 𝑘 ≤ 𝑥  we then only need max 0, i − w − min { 𝑜 , 𝑗 + 𝑥 } inner loops (Aggarwal Ch. 3.4) VII-1: 8 IRDM ‘15/16

Lower bounds on DTW Even smarter is to speed up DTW using a lower bound. 𝑜 𝑗 − 𝑉 𝑗 2 𝑍 if 𝑌 𝑗 > 𝑉 𝑗 𝑀𝑀 _ 𝐿𝑒𝐿𝐿𝐿 ( 𝑌 , 𝑍 ) = � � if 𝑌 𝑗 < 𝑀 𝑗 𝑗 − 𝑀 𝑗 2 𝑍 otherwise 0 𝑗=1 { 𝑌 𝑗−𝑠 : 𝑌 𝑗+𝑠 } Y 𝑉 𝑗 = max { 𝑌 𝑗−𝑠 : 𝑌 𝑗+𝑠 } 𝑀 𝑗 = min U where 𝑠 is the reach, the allowed range of warping X L VII-1: 9 IRDM ‘15/16

Discrete S Sequence ces VII- 2: 10 IRDM ‘15/16

Chapter 7.4: Basi asic Ideas eas Aggarwal Ch. 14.1-14.2 VII-2: 11 IRDM ‘15/16

Trouble in Time Series Paradise Continuous real-valued time series have their downsides  mining results rely on either a dis istanc nce function n or assu sump mption ons  indexing, pattern mining, summarisation, clustering, classification, and outlier detection results hence rely on arbitrar ary choices Discrete sequences are often easier to deal with  mining results rely mostly on count nting ing How to transform a time series into an event sequence?  discretisation VII-2: 12 IRDM ‘15/16

Approximating a Time Series (Lin et al. 2002, 2007) VII-2: 13 IRDM ‘15/16

SAX Symbolic Aggregate Approximation (SAX)  most well-known approach to discretise a time series  type of piece-wise aggregated approximation (PAA) How to do SAX  divide the data into 𝑥 fr frames es  compute the mean per frame  perform equal-height binning over the means, to obtain an alphabet of 𝑒 characters (Lin et al. 2002, 2007) VII-2: 14 IRDM ‘15/16

Definitions A discrete seque uenc nce 𝑌 1 … 𝑌 𝑜 of length 𝑜 and dimensionality 𝑒 , contains 𝑒 discrete feature values at each of 𝑜 different timestamps 𝑒 1 … 𝑒 𝑜 . Each of the 𝑜 comp ompon onents 𝑌 𝑗 contains 𝑒 discrete 1 … 𝑦 𝑗 𝑒 ) collected at the 𝑗 th behavioral attributes ( 𝑦 𝑗 timestamp. The actual time stamps are usually ignored – they only induce an order on the components, or eve vents ts. VII-2: 15 IRDM ‘15/16

Types of discrete sequences In many applications, the dimensionality is 1  e.g. strings, such as text or genomes.  for AATCGTAC over an alphabet Σ = {A, C, G, T} , each 𝑌 𝑗 ∈ Σ In some applications, each 𝑌 𝑗 is not a vector, but a se set  e.g. a supermarket transaction, 𝑌 𝑗 ⊆ Σ  there is no order within 𝑌 𝑗 We will consider the set-setting, as it is most general VII-2: 16 IRDM ‘15/16

Chapter 7.5: Freque uent nt P Pat atterns ns Aggarwal Ch. 15.2 VII-2: 17 IRDM ‘15/16

Sequential patterns A se sequ quential p patt attern is a sequence.  to occur in the data, it has to be a subsequence of the data. 𝒴 = a b a a b b a b d c a d b a a b c a 𝒶 = a b Defini inition: n: Given two sequences 𝒴 = 𝑌 1 … 𝑌 𝑜 and 𝒶 = 𝑎 1 … 𝑎 𝑙 where all elements 𝑌 𝑗 and 𝑎 𝑗 in the sequences are sets. Then, the sequence 𝒶 is a subsequ equen ence of 𝒴 , if 𝑙 elements 𝑌 𝑗 1 … 𝑌 𝑗 𝑙 can be found in 𝒴 , such that 𝑗 1 < 𝑗 2 < ⋯ < 𝑗 𝑙 and 𝑎 𝑘 ⊆ 𝑌 𝑗 𝑘 for each 𝑘 ∈ {1 … 𝑙 } VII-2: 18 IRDM ‘15/16

Support Depending on whether we have a datab atabas ase 𝑬 of sequences, or a singl gle l long s g sequ equence, we have to define the suppo support of a sequential pattern differently. Standard, or ‘per sequence’ support counting  given a database 𝑬 = { 𝒴 1 , … , 𝒴 𝑂 } , the support of a subsequence 𝒶 is the number of sequences in 𝑬 that contain 𝒶 . Window-based support counting  given a single sequence 𝒴 , the support of a subsequence 𝒶 is the number of windo dows over 𝒴 that contain 𝒶 . (we can define frequency analogue as relative support) VII-2: 19 IRDM ‘15/16

Windows A wind ndow ow 𝒴 [ 𝑒 ; 𝑒 ] is a strict subsequence of sequence 𝒴 . 𝒴 [ 𝑒 ; 𝑒 ] = 𝑌 𝑗 ∈ 𝒴 ∣ 𝑒 ≤ 𝑗 ≤ s 𝒴 = a b d c a d b a a b c a d a b a b c 𝒶 = a b Window-based support counting  we can choose a window length 𝑥 , and sweep over the data VII-2: 20 IRDM ‘15/16

Windows A window 𝒴 [ 𝑒 ; 𝑒 ] is a strict subsequence of sequence 𝒴 . 𝒴 [ 𝑒 ; 𝑒 ] = 𝑌 𝑗 ∈ 𝒴 ∣ 𝑒 ≤ 𝑗 ≤ s 𝒴 = a b d c a d b a a b c a d a b a b c 𝒶 = a b : 1 Window-based support counting  we can choose a window length 𝑥 , and sweep over the data VII-2: 21 IRDM ‘15/16

Windows A window 𝒴 [ 𝑒 ; 𝑒 ] is a strict subsequence of sequence 𝒴 . 𝒴 [ 𝑒 ; 𝑒 ] = 𝑌 𝑗 ∈ 𝒴 ∣ 𝑒 ≤ 𝑗 ≤ s 𝒴 = a b d c a d b a a b c a d a b a b c 𝒶 = a b : 1 Window-based support counting  we can choose a window length 𝑥 , and sweep over the data VII-2: 22 IRDM ‘15/16

Windows A window 𝒴 [ 𝑒 ; 𝑒 ] is a strict subsequence of sequence 𝒴 . 𝒴 [ 𝑒 ; 𝑒 ] = 𝑌 𝑗 ∈ 𝒴 ∣ 𝑒 ≤ 𝑗 ≤ s 𝒴 = a b d c a d b a a b c a d a b a b c 𝒶 = a b : 1 : 2 Window-based support counting  we can choose a window length 𝑥 , and sweep over the data VII-2: 23 IRDM ‘15/16

Windows A window 𝒴 [ 𝑒 ; 𝑒 ] is a strict subsequence of sequence 𝒴 . 𝒴 [ 𝑒 ; 𝑒 ] = 𝑌 𝑗 ∈ 𝒴 ∣ 𝑒 ≤ 𝑗 ≤ s 𝒴 = a b d c a d b a a b c a d a b a b c 𝒶 = a b : 2 : 3 Window-based support counting  we can choose a window length 𝑥 , and sweep over the data VII-2: 24 IRDM ‘15/16

Windows A window 𝒴 [ 𝑒 ; 𝑒 ] is a strict subsequence of sequence 𝒴 . 𝒴 [ 𝑒 ; 𝑒 ] = 𝑌 𝑗 ∈ 𝒴 ∣ 𝑒 ≤ 𝑗 ≤ s 𝒴 = a b d c a d b a a b c a d a b a b c 𝒶 = a b : 3 : 4 Window-based support counting  we can choose a window length 𝑥 , and sweep over the data  support is now dependent on 𝑥 , what happens with longer 𝑥 ? VII-2: 25 IRDM ‘15/16

Chapter 7-2: Discrete S ete Sequentia ential D Data ta Jilles - PowerPoint PPT Presentation

Chapter 7-2: Discrete S ete Sequentia ential D Data ta Jilles Vreeken IRDM 15/16 26 Nov 2015 IRDM Chapter 7, overview Time Series Basic Ideas 1. Prediction 2. Motif Discovery 3. Discrete Sequences Basic Ideas 4. Pattern

ETE FOUNDATION SCHOOL MASTER PLAN ISALARO, SOUTH SUDAN 23 MAY 2016 ETE FOUNDATION SCHOOL

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Discrete Mathematics & Mathematical Reasoning Chapter 7: Discrete Probability Kousha

Discrete Mathematics Jeremy Siek Spring 2010 Jeremy Siek Discrete Mathematics 1 / 118 Jeremy

Cyber-Physical Systems Discrete Dynamics IECE 553/453 Fall 2019 Prof. Dola Saha 1 Discrete

CMSC 222: Discrete Mathematics Prof S Fall 2018 What is Discrete Mathematics? Discrete

Cyber-Physical Systems Discrete Dynamics ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Discrete

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

Discrete-time Systems in the Time Domain Chaiwoot Boonyasiriwat August 21, 2020 Discrete-time

Discrete Mathematics & Mathematical Reasoning Chapter 7: Discrete Probability Colin

IC ICIC ICI I Prud udent ential ial Val alue ue Fun und d Se Seri ries es 16 16 th

28 28 29 ential diagnosis to establish cause and effect. In one Frye expert testimony to

50 50 Yea Years of Ess rs of Essen ential tial Edu Educa cation tion 2019 2020 Stu

Essen ential L Leade dershi hip T p Tool: The he U Use o of Inf nfluenc uence Marianna

A Ref efer eren ential tial Me Meth thod odol olog ogy y fo for Edu ducat atio ion n

De Decision cision Th Theo eory: ry: Se Sequ quential ential De Decisions cisions Co

Faith, Family & Fun Complicated The Family In Crisis 30% - 40 % Divorce rate Only

Dreams and Visions . .. .. . . . .. . . .. . . .. . . .. . .. . . . .. . .

APAN30 Event Committee Meeting Date : 12 August 2010 Venue and Time : Melia Hotel, Hanoi,

DOWNTOWN MANCHESTER WELCOMING TRANSFORMATION WELCOMING TRANSFORMATION 1 05/06/2019 Downtown

Acts Series Lesson #44 October 11, 2011 Dean Bible Ministries www.deanbible.org Dr. Robert L.

Markov Logic Networks Andrea Passerini passerini@disi.unitn.it Statistical relational learning

ParameterEstimation October 22, 2018 1 Lecture 18: Parameter Estimation CBIO (CSCI) 4835/6835:

CSE 440: Introduction to HCI User Interface Design, Prototyping, and Evaluation Lecture 01:

Chapter 7-2: Discrete S ete Sequentia ential D Data ta Jilles - PowerPoint PPT Presentation

Chapter 7-2: Discrete S ete Sequentia ential D Data ta Jilles Vreeken IRDM 15/16 26 Nov 2015 IRDM Chapter 7, overview Time Series Basic Ideas 1. Prediction 2. Motif Discovery 3. Discrete Sequences Basic Ideas 4. Pattern

ETE FOUNDATION SCHOOL MASTER PLAN ISALARO, SOUTH SUDAN 23 MAY 2016 ETE FOUNDATION SCHOOL

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Discrete Mathematics &amp; Mathematical Reasoning Chapter 7: Discrete Probability Kousha

Discrete Mathematics Jeremy Siek Spring 2010 Jeremy Siek Discrete Mathematics 1 / 118 Jeremy

Cyber-Physical Systems Discrete Dynamics IECE 553/453 Fall 2019 Prof. Dola Saha 1 Discrete

CMSC 222: Discrete Mathematics Prof S Fall 2018 What is Discrete Mathematics? Discrete

Cyber-Physical Systems Discrete Dynamics ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Discrete

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

Discrete-time Systems in the Time Domain Chaiwoot Boonyasiriwat August 21, 2020 Discrete-time

Discrete Mathematics &amp; Mathematical Reasoning Chapter 7: Discrete Probability Colin

IC ICIC ICI I Prud udent ential ial Val alue ue Fun und d Se Seri ries es 16 16 th

28 28 29 ential diagnosis to establish cause and effect. In one Frye expert testimony to

50 50 Yea Years of Ess rs of Essen ential tial Edu Educa cation tion 2019 2020 Stu

Essen ential L Leade dershi hip T p Tool: The he U Use o of Inf nfluenc uence Marianna

A Ref efer eren ential tial Me Meth thod odol olog ogy y fo for Edu ducat atio ion n

De Decision cision Th Theo eory: ry: Se Sequ quential ential De Decisions cisions Co

Faith, Family &amp; Fun Complicated The Family In Crisis 30% - 40 % Divorce rate Only

Dreams and Visions . .. .. . . . .. . . .. . . .. . . .. . .. . . . .. . .

APAN30 Event Committee Meeting Date : 12 August 2010 Venue and Time : Melia Hotel, Hanoi,

DOWNTOWN MANCHESTER WELCOMING TRANSFORMATION WELCOMING TRANSFORMATION 1 05/06/2019 Downtown

Acts Series Lesson #44 October 11, 2011 Dean Bible Ministries www.deanbible.org Dr. Robert L.

Markov Logic Networks Andrea Passerini passerini@disi.unitn.it Statistical relational learning

ParameterEstimation October 22, 2018 1 Lecture 18: Parameter Estimation CBIO (CSCI) 4835/6835:

CSE 440: Introduction to HCI User Interface Design, Prototyping, and Evaluation Lecture 01:

Discrete Mathematics & Mathematical Reasoning Chapter 7: Discrete Probability Kousha

Discrete Mathematics & Mathematical Reasoning Chapter 7: Discrete Probability Colin

Faith, Family & Fun Complicated The Family In Crisis 30% - 40 % Divorce rate Only