symbolic aggregate
play

Symbolic Aggregate Case of Interval . . . ApproXimation (SAX) How - PowerPoint PPT Presentation

Formulation of the . . . Symbolic Aggregate . . . SAX: Problem Towards Formulating . . . Symbolic Aggregate Case of Interval . . . ApproXimation (SAX) How Measurement . . . How Measurement . . . under Interval Uncertainty Solving the . . .


  1. Formulation of the . . . Symbolic Aggregate . . . SAX: Problem Towards Formulating . . . Symbolic Aggregate Case of Interval . . . ApproXimation (SAX) How Measurement . . . How Measurement . . . under Interval Uncertainty Solving the . . . What If We Minimize . . . Chrysostomos D. Stylios 1 and Vladik Kreinovich 2 Home Page Title Page 1 Laboratory of Knowledge and Intelligent Computing Department of Computer Engineering ◭◭ ◮◮ Technological Educational Institute of Epirus ◭ ◮ 47100 Kostakioi, Arta, Greece, stylios@teiep.gr Page 1 of 22 2 Department of Computer Science University of Texas at El Paso, 500 W. University Go Back El Paso, Texas 79968, USA vladik@utep.edu Full Screen Close Quit

  2. Formulation of the . . . Symbolic Aggregate . . . 1. Formulation of the Problem SAX: Problem • Need for diagnostics: often, we are monitoring a certain Towards Formulating . . . process for possible problems; e.g.: Case of Interval . . . How Measurement . . . – we check the observed vibrations of a mechanical How Measurement . . . system indicate an abnormality; Solving the . . . – we check the vital signs of a patient to see if an What If We Minimize . . . urgent medical intervention is needed. Home Page • Sometimes, we have an algorithm that, based on the Title Page observations, decided whether intervention is needed. ◭◭ ◮◮ • However, in most practical applications – especially in ◭ ◮ medicine – no such algorithm is readily available. Page 2 of 22 • What we have instead is numerous past data series corresponding both: Go Back – to cases when situation turned out to be normal, Full Screen – and to cases with abnormality. Close Quit

  3. Formulation of the . . . Symbolic Aggregate . . . 2. Formulation of the Problem (cont-d) SAX: Problem • We have numerous past data series corresponding both: Towards Formulating . . . Case of Interval . . . – to cases when situation turned out to be normal, How Measurement . . . – and to cases with abnormality. How Measurement . . . • We thus need to extract such an algorithm from all Solving the . . . these examples, i.e., use machine learning . What If We Minimize . . . Home Page • Most machine learning algorithms work well if we have up to dozens of inputs. Title Page ◭◭ ◮◮ • However, as a result of monitoring, we get values x ( t ) corresponding to hundreds of moments of time t . ◭ ◮ • So, to efficiently apply machine learning algorithms, we Page 3 of 22 first need to compress the input data. Go Back Full Screen Close Quit

  4. Formulation of the . . . Symbolic Aggregate . . . 3. Symbolic Aggregate approXimation (SAX): SAX: Problem Main Idea Towards Formulating . . . • The main objective of monitoring is to catch deviations Case of Interval . . . from the normal regimes as early as possible. How Measurement . . . How Measurement . . . • As a result, monitoring is performed at a high rate, to Solving the . . . catch a deviation while this deviation is small. What If We Minimize . . . • Thus, when the monitoring is arranged properly, values Home Page change very little from one moment to the next. Title Page • So, we can safely replace the original function x ( t ) with ◭◭ ◮◮ a piece-wise constant approximation. ◭ ◮ • On each interval, we store only its endpoints and the Page 4 of 22 value of the function on this interval. Go Back • This representation indeed leads to a drastic reduction in data size. Full Screen Close Quit

  5. Formulation of the . . . Symbolic Aggregate . . . 4. Symbolic Aggregate approXimation (cont-d) SAX: Problem • A further compression is possible since: Towards Formulating . . . Case of Interval . . . – a computer-represented real number require dozens How Measurement . . . of bits to store, corresponding to ten decimal digits, How Measurement . . . – but measurements accuracy is usually 1–10%, so Solving the . . . two decimal digits are enough. What If We Minimize . . . • Symbolic Aggregate approXimation (SAX) is a tech- Home Page nique for such a reduction. Title Page • In the interval [ x, x ] of possible values of x ( t ), we select ◭◭ ◮◮ thresholds x 0 = x, x 1 , x 2 , . . . , x m . ◭ ◮ • Then, for each moment of time t , instead of storing Page 5 of 22 x ( t ), we store the index i for which x ( t ) ∈ [ x i , x i +1 ]. Go Back • At present, SAX is the most efficient data compression Full Screen technique. Close Quit

  6. Formulation of the . . . Symbolic Aggregate . . . 5. SAX: Details and Successes SAX: Problem • To maximize the amount of information after compres- Towards Formulating . . . sion, SAX takes into account that: Case of Interval . . . How Measurement . . . – the maximum amount of Shannon’s information � m How Measurement . . . − p i · log 2 ( p i ), where p i = Prob( x ( t ) ∈ [ x i , x i +1 ]), Solving the . . . i =0 – is attained when all the probabilities p i are equal What If We Minimize . . . 1 Home Page to each other – and is, thus, equal to p i = m + 1. Title Page • Thus, SAX selects the thresholds x i for which ◭◭ ◮◮ 1 p i = Prob( x ( t ) ∈ [ x i , x i +1 ]) = m + 1 . ◭ ◮ Page 6 of 22 • SAX techniques led to many practical applications ranging from engineering to medicine. Go Back Full Screen Close Quit

  7. Formulation of the . . . Symbolic Aggregate . . . 6. SAX: Problem SAX: Problem • Measurement errors were a motivation for SAX tech- Towards Formulating . . . niques. Case of Interval . . . How Measurement . . . • However, SAX does not take measurement errors into How Measurement . . . account. Solving the . . . • So, we often get thresholds x i and x i +1 which are much What If We Minimize . . . closer to each other than the measurement accuracy. Home Page • Sometimes, x i and x i +1 differ by 5% while the mea- Title Page surement accuracy is 10%. ◭◭ ◮◮ • In this case, we cannot tell whether the actual value ◭ ◮ x ( t ) was in the i -th interval or in the next interval. Page 7 of 22 • It is therefore desirable to explicitly take measurement Go Back uncertainty into account in SAX techniques. Full Screen • This is what we do in this paper. Close Quit

  8. Formulation of the . . . Symbolic Aggregate . . . 7. Case When Measurement Inaccuracy Can Be SAX: Problem Ignored (Reminder) Towards Formulating . . . • Based on the observed values x ( t ), we can find the Case of Interval . . . probabilities with which different values of x occur. How Measurement . . . How Measurement . . . • These probabilities can be naturally described by a � Solving the . . . probability density function ρ ( x ), with ρ ( x ) dx = 1. What If We Minimize . . . • In many practical situations, the observed signal is a Home Page joint effect of many different independent processes. Title Page • In such situations, the Central Limit Theorem implies ◭◭ ◮◮ that the resulting distribution is Gaussian. ◭ ◮ • We want to select the thresholds x 1 , x 2 , . . . Page 8 of 22 • We can describe, for every value x , the number ρ t ( x ) of � Go Back thresholds per unit length; the total is ρ t ( x ) dx = m . Full Screen Close Quit

  9. Formulation of the . . . Symbolic Aggregate . . . 8. Case of No Measurement Inaccuracy (cont-d) SAX: Problem • After the data compression, the only information that Towards Formulating . . . we have about each value x ( t ) in the index i . Case of Interval . . . How Measurement . . . • So, to reconstruct the value x ( t ) based on this informa- How Measurement . . . tion, we select the midpoint � x ( t ) of the i -th subinterval. Solving the . . . • This reconstruction is approximate, there is an approx- What If We Minimize . . . def imation error ε ( t ) = � x ( t ) − x ( t ) � = 0. Home Page • Ideally, we would like to have all these errors to be as Title Page close to 0 as possible. ◭◭ ◮◮ • The vector ε = ( ε ( t 1 ) , ε ( t 2 ) , . . . ) of these errors should ◭ ◮ be close to the zero vector � 0 = (0 , 0 , . . . ): �� Page 9 of 22 ( ε ( t k )) 2 → min . d ( ε,� 0) = Go Back k Full Screen • In the continuous approximation, this is equivalent to � ( ε ( t )) 2 dt . Close minimizing Quit

  10. Formulation of the . . . Symbolic Aggregate . . . 9. Alternative Ideas SAX: Problem • The least-squares approach is vulnerable to outliers. Towards Formulating . . . Case of Interval . . . • The second idea is to avoid this sensitivity by using How Measurement . . . ℓ p -estimates: � | ε ( t ) | p dt → min . How Measurement . . . Solving the . . . What If We Minimize . . . • The third idea is to explicitly minimize the number of Home Page bits needed to describe all the thresholds. Title Page • If x i +1 − x i ≈ 2 − b , then it is sufficient to describe the first b binary digits of the corresponding interval. ◭◭ ◮◮ ◭ ◮ • This, the number of bits needed to store each threshold is approximately equal to b ≈ − log 2 ( x i +1 − x i ). Page 10 of 22 • So, we minimize the average number of bits, i.e., the Go Back sum − � log 2 ( x i +1 − x i ) or the corresponding integral. Full Screen k Close Quit

Recommend


More recommend