spatial temporal k nearest neighbors model on mapreduce
play

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic - PowerPoint PPT Presentation

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction A. Agafonov, A. Yumaganov Samara National Research University The 19th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL


  1. Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction A. Agafonov, A. Yumaganov Samara National Research University The 19th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2018) A. Agafonov, A. Yumaganov Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 1 / 15

  2. Task definition Forecast the traffic flow in 10 minutes ahead Take into account spatial and temporal characteristics of the traffic flow Develop a distributed forecasting model Efficiently process large-scale traffic data Task Real-time processing High accuracy A. Agafonov, A. Yumaganov Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 2 / 15

  3. Problem formulation G = ( N , E ) is a directed graph representing the road network; N is a node representing the road intersection; E is an edge denoting the road segment; V j t is an observed traffic flow characteristic on an edge j ∈ E in a time moment t . Given a graph G ( N , E ) and traffic flow data V j t , j ∈ E , t = 1 , 2 , . . . T , predict the traffic flow characteristic at a time interval ( t + ∆ ) for a predefined prediction horizon ∆ . A. Agafonov, A. Yumaganov Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 3 / 15

  4. Proposed model A short-term traffic flow forecasting model based on non-parametric regression k nearest neighbors algorithm is proposed. Feature Distance Prediction vector metric function k nearest neighbors A. Agafonov, A. Yumaganov Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 4 / 15

  5. Feature vector Time-Domain Upstream / Downstream (TDUD) feature-vector: ( V j t − T , . . . , V j t − 1 , V j t , V j − 1 t − T , . . . , V j − 1 t − 1 , V j − 1 V j + 1 t − T , . . . , V j + 1 t − 1 , V j + 1 ) t t Proposed feature vector: Partition the transportation network graph into several spatially compact clusters { G i } and define the cluster feature vector { V j t } , j ∈ G i , t = t cur − T , . . . , t cur Reduce the dimensionality of the cluster feature vector using PCA procedure { X n } i , n = 1 , . . . , N Define the result feature vector for each road segment j ∈ E S j = ( { V j t } , { X n } i ) , i : j ∈ G i , t = t cur − T , . . . , t cur , n = 1 , . . . , N . A. Agafonov, A. Yumaganov Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 5 / 15

  6. Graph partitioning Partitioning by area G area Partitioning by distance G dist G dist = { j ∈ E : r ( i , j ) < = R } , i where r ( i , j ) is the distance, i ∈ E , j ∈ E A. Agafonov, A. Yumaganov Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 6 / 15

  7. Proximity measure Weighted Euclidean distance with the trend adjustment: d ( S , ¯ S i ) = d link ( V , ¯ V i ) + γ d pca ( X , ¯ X i ) , � T � T � � t − 1 � ¯ � 2 + (1 − α ) �� 2 � � � ̙ T − t + 1 � � d link ( V , ¯ V i ) = α V t − ¯ t − ¯ V i V i V i ( V t − V δ ) − , t δ t = 1 t = 2 δ = 1 � N � � 2 � � d pca ( X , ¯ X i ) = X n − ¯ X i . n n = 1 A. Agafonov, A. Yumaganov Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 7 / 15

  8. Prediction function Prediction function by the weighted average: K d − 1 � ˆ k V k V T + 1 = T + 1 � K k = 1 d − 1 k = 1 k Prediction function that combines the weighted average and the trend adjustment: K  K T � d − 1  V T + 1 � � �  �  ˆ k V k V k T + 1 − V k T + 1 + (1 − θ )   V T + 1 = θ    t  � K   k = 1 d − 1 KT  k = 1 k k = 1 t = 1 A. Agafonov, A. Yumaganov Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 8 / 15

  9. MapReduce-based implementation Input data Preprocessing phase Map phase Shuffle phase Reduce phase Output data train_0 Map Sort test_0 procedure key1:local_top_list1 Reduce train_split_0 key1:local_top_list2 procedure ... train_split_1 Split data train_1 Training data Map ... Sort test_0 procedure train_split_n key1:global_top_list ... ... key2:global_top_list Cartesian join key3:global_top_list ... test_split_0 keyT:global_top_list train_n-1 Map Sort procedure test_k-1 test_split_1 Testing data Split data ... keyT:local_top_list1 Reduce keyT:local_top_list2 procedure test_split_k ... train_n Map Sort test_k procedure A. Agafonov, A. Yumaganov Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 9 / 15

  10. Model analysis Comparison: proposed kNN model TDUD-KNN SARIMA n MAE = 1 � | V t − ˆ V t | , n t = 1 n | V t − ˆ MAPE = 1 V t | � × 100% n V t t = 1 Data set: Transportation network with 26018 road segments Average speed in a period of 60 days New data each 10 minutes A. Agafonov, A. Yumaganov Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 10 / 15

  11. Model analysis. MAE / MAPE 11.75 11.7 11.65 Table: Algorithms Comparison 11.6 MAE MAPE MAPE, % 11.55 R = 1 2.378 10.61 11.5 R = 2 2.374 10.598 R = 3 2.372 10.593 11.45 G area 2.379 10.596 11.4 TDUD-KNN 2.387 10.611 11.35 SARIMA 2.399 10.77 11.3 0 5 10 15 20 25 30 35 40 45 k A. Agafonov, A. Yumaganov Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 11 / 15

  12. Model analysis. MAE / MAPE by days MAE MAPE 2.5 12 11.5 11 2.4 10.5 MAE, km/h MAPE, % 10 9.5 2.3 9 8.5 2.2 8 213 215 217 219 221 223 225 227 213 215 217 219 221 223 225 227 Day of year Day of year TDUD-KNN R=3 SARIMA TDUD-KNN R=3 SARIMA A. Agafonov, A. Yumaganov Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 12 / 15

  13. Model analysis. Execution time Cluster up to 6 PC: Intel Core i5-3740 3.20 GHz, 8 GB RAM Execution time Scaleup 400 1.2 350 346 1 1 0.93 300 Execution time, sec 0.86 0.82 0.81 0.8 Scaleup value 250 0.72 200 0.6 176 150 139 0.4 100 101 88 74 0.2 50 0 0 1 2 3 4 5 6 1 2 3 4 5 6 Number of nodes Number of nodes A. Agafonov, A. Yumaganov Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 13 / 15

  14. Conclusion The distributed spatial-temporal model of short-term traffic flow forecasting has the following advantages: The model takes into account spatial and temporal characteristics of the traffic flow. The implementation is based on MapReduce processing model in the open-source cluster-computing framework Apache Spark for distributed Big Data processing. The proposed model has a high prediction accuracy and reasonable execution time, sufficient for real-time prediction. A. Agafonov, A. Yumaganov Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 14 / 15

  15. Thank you! Anton Agafonov ant.agafonov@gmail.com The work was supported by the Ministry of Science and Higher Education of the Russian Federation (project no. RFMEFI57518X0177) A. Agafonov, A. Yumaganov Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 15 / 15

Recommend


More recommend