2019 2019 我们毕业啦 其 实 是 答 辩 的 标 题 地 方 Daizong Ding 1 Mi Zhang 1 Xudong Pan 1 Min Yang 1 Xiangnan He 2 1. School of Computer Science, Fudan University 2. School of Data Science, University of Science and Technology of China
Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Time Series Prediction Length= 𝑈 Length= 𝐿 Training Inputs: 𝑌 1:𝑈 = 𝑦 1 , ⋯ , 𝑦 𝑈 Labels: 𝑍 1:𝑈 = 𝑧 1 , ⋯ , 𝑧 𝑈 Outputs: 𝑃 1:𝑈 = 𝑝 1 , ⋯ , 𝑝 𝑈 ? 𝑈 𝑝 𝑢 − 𝑧 𝑢 2 Goal: min σ 𝑢=1 Testing Inputs: 𝑌 1:𝑈+𝐿 = 𝑦 1 , ⋯ , 𝑦 𝑈 , 𝑦 𝑈+1 , ⋯ , 𝑦 𝑈+𝐿 Outputs: 𝑃 1:𝑈+𝐿 = 𝑝 1 , ⋯ , 𝑝 𝑈 , 𝑝 𝑈+1, ⋯ , 𝑝 𝑈+𝐿
Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Recurrent Neural Network Train Test Results Training 𝑧 1 𝑧 2 𝑧 𝑈 For 𝑢 = 1, ⋯ , 𝑈 : 𝑝 1 𝑝 2 𝑝 𝑈 𝑝 𝑈+𝐿 ℎ 𝑢 = 𝐻𝑆𝑉 𝑦 1 , ⋯ , 𝑦 𝑢 … … 𝑈 ℎ 𝑢 + 𝑐 𝑝 𝑝 𝑢 = 𝑋 𝑝 𝑈 𝑝 𝑢 − 𝑧 𝑢 2 min σ 𝑢=1 FC FC FC FC ℎ 𝑈+𝐿 ℎ 1 ℎ 2 ℎ 𝑈 … … Testing For 𝑢 = 1, ⋯ , 𝑈 + 𝐿 : GRU GRU … GRU … GRU ℎ 𝑢 = 𝐻𝑆𝑉 𝑦 1 , ⋯ , 𝑦 𝑢 𝑈 ℎ 𝑢 + 𝑐 𝑝 𝑝 𝑢 = 𝑋 𝑝 𝑦 1 𝑦 2 𝑦 𝑈 𝑦 𝑈+𝐿
Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Underfitting Phenomenon
Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Overfitting Phenomenon
Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Extreme Events in Time Series Data Characteristic • Extremely small or large values • Irregular • Rare occurrences • Light-tailed distributions (Gaussian, Poisson, etc.) cannot model them well Problem • Why Deep Neural Network could suffer extreme event problem in time series prediction? • How can we improve the performance on the prediction of extreme events?
Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Estimated Distribution of Labels 𝒛 𝒖 • The optimization of deep neural network under probability perspective: Bregman 𝜐 2 ⟺ max ς 𝑢=1 𝑈 𝑝 𝑢 − 𝑧 𝑢 2 𝑈 𝑈 min σ t=1 max ς 𝑢=1 𝒪 𝑧 𝑢 𝑝 𝑢 , Ƹ 𝑄 𝑧 𝑢 𝑦 𝑢 , 𝜄 Divergence • With Bayes Theorem, Estimated distribution of 𝑄 𝑍 𝑌, 𝜄 = 𝑄 𝑌 𝑍, 𝜄 𝑄 𝑍 𝑄 𝑍 = 1 labels 𝑈 𝜐 2 ) 𝑈 σ 𝑢=1 𝒪(𝑧 𝑢 , Ƹ Likelihood 𝑄 𝑌|𝜄 Posterior • DNN will internally estimate the distribution of 𝑧 𝑢 according to the sampled data.
Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Extreme Event Problem in DNN Underfitting Phenomenon • For those normal points, e.g., 𝑧 1 , 𝑄 𝑧 1 𝑌, 𝜄 = 𝑄 𝑌 𝑧 1 , 𝜄 𝑄 𝑧 1 ≥ 𝑄 𝑌 𝑧 1 , 𝜄 𝑄 𝑢𝑠𝑣𝑓 𝑧 1 = 𝑄 𝑢𝑠𝑣𝑓 𝑧 1 𝑌, 𝜄 𝑄 𝑌, 𝜄 𝑄 𝑌, 𝜄 • For those rarely occurred extreme events, e.g., 𝑧 2 , 𝑄 𝑧 2 𝑌, 𝜄 = 𝑄 𝑌 𝑧 2 , 𝜄 𝑄 𝑧 2 ≤ 𝑄 𝑌 𝑧 2 , 𝜄 𝑄 𝑢𝑠𝑣𝑓 𝑧 2 = 𝑄 𝑢𝑠𝑣𝑓 𝑧 2 𝑌, 𝜄 𝑄 𝑌, 𝜄 𝑄 𝑌, 𝜄 • Therefore model commonly lacks the ability of predicting extreme events 𝑧 1 𝑧 2 𝑧 3
Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Extreme Event Problem in DNN Overfitting Phenomenon • If we add weights of extreme events during the training • For those normal points, e.g., 𝑧 1 , 𝑄 𝑧 1 𝑌, 𝜄 = 𝑄 𝑌 𝑧 1 , 𝜄 𝑄 𝑧 1 ≤ 𝑄 𝑌 𝑧 1 , 𝜄 𝑄 𝑢𝑠𝑣𝑓 𝑧 1 = 𝑄 𝑢𝑠𝑣𝑓 𝑧 1 𝑌, 𝜄 𝑄 𝑌, 𝜄 𝑄 𝑌, 𝜄 • For those rarely occurred extreme events, e.g., 𝑧 3 , 𝑄 𝑧 3 𝑌, 𝜄 = 𝑄 𝑌 𝑧 3 , 𝜄 𝑄 𝑧 3 ≥ 𝑄 𝑌 𝑧 3 , 𝜄 𝑄 𝑢𝑠𝑣𝑓 𝑧 3 = 𝑄 𝑢𝑠𝑣𝑓 𝑧 3 𝑌, 𝜄 𝑄 𝑌, 𝜄 𝑄 𝑌, 𝜄 𝑧 1 𝑧 2 𝑧 3 • The estimated distribution is not accurate • The performance on test data is poor
Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Problem Analysis Extreme Event Problem in DNN mainly because: • Extreme events are extremely large or small values with rare occurrence. Therefore it is hard to estimate the true distribution of them given limited samples. • Usually DNN learns time series data from light-tailed likelihood, which further increases the difficulty of estimating the distribution of extreme events. 𝑧 1 𝑧 2 𝑧 3
Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Motivation: Find the regularity inside irregular extreme events According to previous research : • Extreme events in time-series data often show some form of temporal regularity. • Randomness of extreme events have limited degrees of freedom (DOF). The pattern of extreme events after a window could be memorized ! S&P 500
Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Recalling Extreme Events in History We propose to use Memory Network to recall extreme events in history: • For each time step 𝑢 , we sample 𝑁 windows. • For window 𝑘 , we propose to use GRU to calculate the feature 𝑡 𝑘 of the window. • Meanwhile, we also record the occurrence of extreme events 𝑟 𝑘 = −1,0,1 by setting threshold Memory Module previously at the next time step of window 𝑘 .
Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Attention Mechanism We propose to use attention to incorporate memory module with the prediction: • At time 𝑢 , we first calculate the output from GRU: • Then we construct the memory module, and calculate the similarity between the current and the history: • The final output from our model is,
Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Extreme Value Theory If we still use Gaussian likelihood, the improved model still suffer extreme event problem: • We should use a heavy-tailed likelihood to fit the distribution of extreme events given limited samples. It is hard to predict the values of extreme events, however, the DOF of extreme events are easier to be modelled. • We could propose a heavy-tailed likelihood for predicting the occurrence of extreme events.
Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Extreme Value Loss • Through Extreme Value Theory (EVT), the approximation of 𝑧 𝑢 from EVT can be written as, Scale function • 𝑤 𝑢 = 0,1 is the indicator of whether a large value will happen or not. • If we pay our attention to predict whether there is an extremely large value at 𝑢 by outputting 𝑣 𝑢 = 0,1 , we can add the weights of extreme events on binary cross entropy loss: Binary cross entropy loss • It is easy to extend the binary classification to u t , v t = −1,0,1 .
Experiments Background Problem Analysis Proposed Model Extreme Value Loss Conclusion Optimization 𝑤 𝑢 𝑧 𝑢 The final loss function can be written as: EVL Square = −1,0,1 Loss For the two challenges in DNN: • We predict the labels from both GRU and memory module, which memorizes the regularity inside extreme events given limited samples. • We propose to minimize a heavy-tailed classification loss (EVL) for detecting the occurrence of extreme events.
Extreme Value Loss Experiments Background Problem Analysis Proposed Model Conclusion Experimental Settings • Dataset: • Stock Dataset: 564 corporations in Nasdaq Stock Market with one sample per week • Climate Dataset: Green Gas Observing Network dataset and Atmospheric Co2 Dataset • Pseudo Periodic Synthetic Dataset • Baselines: • LSTM • GRU • Time-LSTM • Research questions: • RQ1: Is our proposed framework effective in time series prediction? • RQ2: Is our proposed loss function EVL worked in detecting extreme events? • RQ3: What is the influence of hyper-parameters in the framework?
Extreme Value Loss Experiments Background Problem Analysis Proposed Model Conclusion Time Series Prediction (RMSE)
Extreme Value Loss Experiments Background Problem Analysis Proposed Model Conclusion Time Series Prediction (Visualization)
Extreme Value Loss Experiments Background Problem Analysis Proposed Model Conclusion Extreme Events Prediction (F1 Value)
Extreme Value Loss Experiments Background Problem Analysis Proposed Model Conclusion Influence of hyper-parameters
Recommend
More recommend