Benchmarking (State-of-the-Art) Univariate Time Series Classifiers - PowerPoint PPT Presentation

Benchmarking (State-of-the-Art) Univariate Time Series Classifiers Patrick Schäfer and Ulf Leser Humboldt-Universität zu Berlin, Wissensmanagement in der Bioinformatik patrick.schaefer@hu-berlin.de BTW 2017, 08.03.2017 1

✤ Time series (TS) result from recording data over time. ✤ Increasingly popular due to the growing importance of automatic sensors producing an increasing flood of large, high-resolution TS. ✤ Application areas: motion sensors, personalized medicine (ECG/EEG signals), machine surveillance, spectrograms, astronomy (starlight-curves), and image outlines/contour of objects. 2

✤ UCR time series archive contains 85 benchmark datasets used in TS research. ✤ Datasets from a whole range of application, grouped by: synthetic, motion sensors, sensor readings and image outlines. ✤ Overall, there are 50.000 train and 100.000 test TS or 55 million values. ✤ At most thousands of TS with thousands of measured values for a single dataset. 3

Long-term human intracranial ✤ At the same time real- EEG recordings time systems emerge: The total file size is >50GB with Billions of measurements 240000x16x6000 measurements for thousands of sensors. (6000 samples, 16 electrodes). Smart Plugs Real-Time Location System „4055 Millions of „The total filesize measurements for 2125 plugs is 2.6 GB and it contains a total distributed across 40 houses.“ of 49,576,080 position events.“ 4

Model ✤ Time series classification (TSC) aims at assigning a class label to an unlabeled query TS based on a model trained from labeled samples. ✤ Most basic: 1-nearest neighbor classifiers. ✤ We look into the four groups of TS classifiers: whole series, shapelets, bag- of-patterns, and ensembles. find Query label 5

Whole Series ✤ Based on a distance measure defined on the whole TS data and 1-NN classification. ✤ Elastic distance measures compensate for small Euclidean differences like warping in the time axis. Distance ✤ Base-line, simple model, cannot skip irrelevant subsections, linear to quadratic complexity in TS length. DTW ✤ Representatives: 1-NN Dynamic Time Warping (DTW) and 1-NN Euclidean distance (ED). 6

Shapelets ✤ Shapelets are TS subsequences that are caffein maximally representative of a class label. ✤ A TS is labeled based on the similarity to a shapelet. ✤ Interpretable, high computational complexity (cubic to bi-quadratic in TS chlorogenic acid length). ✤ Representatives: Shapelet Transform (ST), Learning Shapelets (LS), Fast Shapelets (FS). 7

Bag-of-Patterns / Bag-of-Features ✤ TS are distinguished by the frequency of occurrence of features generated over substructures of the TS. ✤ A bag-of-patterns (histogram) of feature counts is used as input to classification. ✤ Fast (linear complexity), noise reducing, but order of substructures gets lost. ✤ Representatives: Bag-of-SFA-Symbols (BOSS), Bag-of-Patterns (BoP), Time Series Bag of Features (TSBF). 8

Ensembles ✤ Ensembles combine different core classifiers (i.e., shapelets, bag-of-patterns, whole series) into a single classifier using bagging or majority voting. ✤ High accuracy by combining different representations but high computational complexity (quadratic to bi- quadratic in TS length). ✤ Representatives: Elastic Ensemble (EE PROP), Collective of Transformation Ensembles (COTE). 9

UCR datasets: Accuracy vs Single Query Prediction Time 90% Accurate and fast Accurate but slower ST 83% BOSS VS Average Accuracy LS COTE 80% BOSS DTW EE (PROP) TSBF BOP DTW CV SAX VSM 70% FS Less accurate and slower 60% 1 10 100 1.000 10.000 Single Query Predict Time in Milliseconds ✤ Slowest (fastest) classifier took 4s (2ms). ✤ Methods are either scalable but offer only inferior accuracy, or they achieve state-of-the-art accuracy but do not scale to larger dataset sizes. 10

✤ Prediction times of state of 87.5% the art. 90% ✤ Using StarLightCurves dataset with 1000 train and 90.4% 8236 test TS of length 1024. 92.6% ✤ Video runs at 10x playback speed. 94.7% ✤ Slowest classifier took 100 97.8% hours. Fastest took 20 ms. 97.9% 97.9% 11

Average Ranks on 85 UCR datasets CD 12 11 10 9 8 7 6 5 4 3 2 1 3.09 COTE 9.62 FastShapelets 4.34 ST 8.65 1-NN DTW 4.78 BOSS 8.39 BoP 5.52 EE (PROP) 8.05 SAXVSM 5.66 LS 7.62 1-NN DTW CV 6.14 BOSS VS 6.15 TSBF ✤ Most accurate TSCs are Ensembles, Shapelets and Bag-of-Patterns:   COTE, ST, BOSS and EE. 12

Conclusion ✤ Methods are either scalable but offer only inferior accuracy, or they achieve state-of-the-art accuracy but do not scale to larger dataset sizes. ✤ Bag-of-Patterns approaches are faster than Shapelets, Ensembles or Whole Series Measures. ✤ Overall, COTE, ST and BOSS show the highest classification accuracy at the cost of increased runtimes. ✤ FS, SAX VSM, BOP, BOSS VS show the lowest runtimes at the cost of limited accuracy. 13

Benchmarking (State-of-the-Art) Univariate Time Series Classifiers - PowerPoint PPT Presentation

Benchmarking (State-of-the-Art) Univariate Time Series Classifiers Patrick Schfer and Ulf Leser Humboldt-Universitt zu Berlin, Wissensmanagement in der Bioinformatik patrick.schaefer@hu-berlin.de BTW 2017, 08.03.2017 1 Time series (TS)

Univariate Time Series Analysis; ARIMA Models Heino Bohn Nielsen 1 of 40 Outline of the Lecture

Table of contents Inference of high-dimensional VAR models Linear time series 1 Basics

A model based approach for benchmarking seasonally adjusted time series Susanne Buijtenhek A

Two-stage Benchmarking of Time-Series Models for Small Area Estimation Danny Pfeffermann,

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

The complexity of factoring univariate polynomials over the rationals Mark van Hoeij Florida

W HAT IS TIME SERIES D ATA ? W HAT IS TIME SERIES D ATA ? A value over time W HAT IS TIME

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

Introduction to Time Series Basic Concepts Time series concepts well cover Elements of

Time Series Representations for Better Data Mining What can we do with time series data?

Introduction to Time Series Heino Bohn Nielsen 1 of 15 Outline (1) What is a time series? (2)

Why do you care? Time-series data is all over the place. Time-Series Data Kaitlin Duck

Financial Econometrics Econ 40357 ARIMA (Auto Regressive Integrated Moving Average) Models Part

BENCHMARKING ROUNDTABLE SERIES IV Graduate Attributes, Learning Outcomes and Assessment Lynn

What the heck is time-series data (and why do I need a time-series database?) Ajay Kulkarni |

Lionel Riou Fransca Univariate & bivariate Two kind of analysis Univariate

Working w ith more than one time series VISU AL IZIN G TIME SE R IE S DATA IN P YTH ON Thomas

Time Series A time series is a collection of data obtained by observing a response variable at

ClickHouse for Time-Series Alexander Zaitsev Agenda What is special about time series What is

Benchmarking Lunch-n-Learn March 18, 2019 Agenda 1. Why Benchmarking? 2. Introduction to

Lecture 6 Discrete Time Series 9/21/2018 1 Discrete Time Series Stationary Processes A

Introd u ction to the Co u rse TIME SE R IE S AN ALYSIS IN P YTH ON Rob Reider Adj u nct

Dynamic Time Warping Averaging of Time Series allows Faster and more Accurate Classification F.

Time series Decomposing a series into meaningful components R.W. Oldford Time series data -

Benchmarking (State-of-the-Art) Univariate Time Series Classifiers - PowerPoint PPT Presentation

Benchmarking (State-of-the-Art) Univariate Time Series Classifiers Patrick Schfer and Ulf Leser Humboldt-Universitt zu Berlin, Wissensmanagement in der Bioinformatik patrick.schaefer@hu-berlin.de BTW 2017, 08.03.2017 1 Time series (TS)

Univariate Time Series Analysis; ARIMA Models Heino Bohn Nielsen 1 of 40 Outline of the Lecture

Table of contents Inference of high-dimensional VAR models Linear time series 1 Basics

A model based approach for benchmarking seasonally adjusted time series Susanne Buijtenhek A

Two-stage Benchmarking of Time-Series Models for Small Area Estimation Danny Pfeffermann,

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

The complexity of factoring univariate polynomials over the rationals Mark van Hoeij Florida

W HAT IS TIME SERIES D ATA ? W HAT IS TIME SERIES D ATA ? A value over time W HAT IS TIME

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

Introduction to Time Series Basic Concepts Time series concepts well cover Elements of

Time Series Representations for Better Data Mining What can we do with time series data?

Introduction to Time Series Heino Bohn Nielsen 1 of 15 Outline (1) What is a time series? (2)

Why do you care? Time-series data is all over the place. Time-Series Data Kaitlin Duck

Financial Econometrics Econ 40357 ARIMA (Auto Regressive Integrated Moving Average) Models Part

BENCHMARKING ROUNDTABLE SERIES IV Graduate Attributes, Learning Outcomes and Assessment Lynn

What the heck is time-series data (and why do I need a time-series database?) Ajay Kulkarni |

Lionel Riou Fransca Univariate &amp; bivariate Two kind of analysis Univariate

Working w ith more than one time series VISU AL IZIN G TIME SE R IE S DATA IN P YTH ON Thomas

Time Series A time series is a collection of data obtained by observing a response variable at

ClickHouse for Time-Series Alexander Zaitsev Agenda What is special about time series What is

Benchmarking Lunch-n-Learn March 18, 2019 Agenda 1. Why Benchmarking? 2. Introduction to

Lecture 6 Discrete Time Series 9/21/2018 1 Discrete Time Series Stationary Processes A

Introd u ction to the Co u rse TIME SE R IE S AN ALYSIS IN P YTH ON Rob Reider Adj u nct

Dynamic Time Warping Averaging of Time Series allows Faster and more Accurate Classification F.

Time series Decomposing a series into meaningful components R.W. Oldford Time series data -

Lionel Riou Fransca Univariate & bivariate Two kind of analysis Univariate