Cellular Network Traffic Scheduling using Deep Reinforcement Learning Sandeep Chinchali, et. al. Marco Pavone, Sachin Katti Stanford University AAAI 2018
arn to optimally manage cellular networks? Can we le lear Delay Tolerant (DT) Traffic Pre-fetched content IoT: Map/SW updates Internet Real-time Mobile Traffic Delay Sensitive 2
Why is IoT/DT traffic scheduling hard? Contending goals IoT Acceptable Limit • Max IoT/DT data Utilization • Loss to mobile traffic IoT • Network limits Optimal Control csandeep@stanford.edu 3
Why is IoT/DT traffic scheduling hard? Melbourne Central Business District, Rolling Average = 1 min 50 Shopping center O ffi ce building Southern cross station 40 Melbourne central station Diverse city-wide cell Congestion C 30 patterns 20 10 0 09:00 11:00 13:00 15:00 17:00 19:00 21:00 Local time csandeep@stanford.edu 4
Our contributions 1. Identify inefficiencies in real cellular networks 4 weeks, 10 diverse cells in Downtown Melbourne, Australia 2. Data Driven, Deep Learning Network Model Our live network experiments match MDP dynamics IoT Scheduler 3. Adaptive RL scheduler IoT rate Flexibly responds to operator Network State reward functions csandeep@stanford.edu 5
Why Deep Learning? 1. Learn time-variant network Melbourne Central Business District, Rolling Average = 1 min 50 dynamics Shopping center O ffi ce building Southern cross station 40 Melbourne central station 2. Adapt to high-level network Congestion C 30 operation goals 20 3. Generalize to diverse cells 10 4. Abundance of network data 0 09:00 11:00 13:00 15:00 17:00 19:00 21:00 Local time csandeep@stanford.edu 6
Related Work 1. Dynamic Resource Allocation • Electricity grid (Reddy 2011) , call admission (Marbach 1998), traffic control (Chu 2016) 2. Data-driven Optimal Control + Forecasting • Deep RL (Mnih 2013, Silver 2014, Lillicrap 2015) • LSTM networks (Hochreiter 1997, Laptev 2017, Shi 2015) 3. Machine Learning for Computer Networks • Cluster Resource Management (Mao 2016) • Mobile Video Streaming (Mao 2017, Yin 2015 ) csandeep@stanford.edu 7
Data-driven problem formulation 1. Network State Space 2. IoT Scheduler Actions 3. Time-variant dynamics 4. Network operator policies Congestion IoT Scheduler Cell efficiency IoT rate Num Users Network state + forecasts 8
Primer on Cell Networks (Link Quality) Goal: Max safe IoT 𝐮𝐬𝐛𝐠𝐠𝐣𝐝 𝑾 𝒖 over day csandeep@stanford.edu 9
RL setup (1): State Space Current Network State Reward Action Full State with Temporal Features Agent Environment Network state Stochastic Forecast (LSTM) Horizon: Day of T mins csandeep@stanford.edu 10
RL setup (2): Action Space IoT Traffic Rate: Reward Action IoT Volume per minute: Agent Environment Network state Utilization gain: csandeep@stanford.edu 11
RL setup (3): Transition Dynamics 1 . 6 Controlled tra ffi c 1 . 5 Reward 1 . 4 Background Congestion C dynamics 1 . 3 Action Agent 1 . 2 Environment 1 . 1 Network state 1 . 0 20:10 20:15 20:20 Local time csandeep@stanford.edu 12
RL setup (4): Operator Rewards Overall weighted reward Reward 1. IoT traffic volume Action What-if model Agent Environment 2. Loss to regular users Network state Goal: Find Optimal Operator Policy 3. Traffic below network limit 13
Evaluation csandeep@stanford.edu 14
Evaluation Criteria 1. Robust performance on diverse cell-day pairs 2. Ability to exploit better forecasts 3. Interpretability Congestion IoT Scheduler Cell efficiency IoT rate Num Users Network state + forecasts 15
1. RL generalizes to several cell-day pairs Respond to operator priorities 100 α 1 2 80 8tilization gain V IoT / V 0 (%) Significant gains: 60 • FCC Spectrum Auction (Reardon 2016) : $4.5B for 10 MHz of 40 spectrum • 14.7% median gain for α = 2 20 • Significant cost savings [simulated] 0 TUain Test csandeep@stanford.edu 16
2. RL effectively leverages forecasts RL Benchmark Richer LSTM forecasts 17
3a. RL exploits transient dips in utilization Controlled Congestion Utilization gain 100 16 Utilization gain V IoT /V 0 (%) Heuristic control Original 14 DDPG control Heuristic control 80 12 DDPG control Transient Dip Congestion C 10 60 8 40 6 4 20 2 0 0 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 Local time Local time 18
3b. RL smooths network throughput Controlled Congestion Resulting Throughput 1 . 6 16 Original 1 . 4 Original 14 Throughput B (MBps) Heuristic control Heuristic control 1 . 2 12 DDPG control DDPG control Congestion C 1 . 0 Throughput limit 10 0 . 8 8 0 . 6 6 4 0 . 4 2 0 . 2 0 0 . 0 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 Local time Local time csandeep@stanford.edu 19
Conclusion Modern networks are evolving • Delay tolerant traffic (IoT updates, pre-fetched content) Data-driven optimal control • LSTM forecasts + RL controller • 14.7% simulated gain -> significant savings Future work: • Operational network tests • Decouple prediction and control Questions: csandeep@stanford.edu csandeep@stanford.edu
Extra slides csandeep@stanford.edu 21
2. RL effectively leverages forecasts Better forecasts enhance performance Discretized MDP for offline optimal 2.4 | ¯ A| =5 2.2 | ¯ A| =20 2.0 | ¯ A| =40 1.8 Reward R | ¯ A| =60 1.6 1.4 1.2 1.0 0.8 0 50 100 150 200 250 | ¯ S| Approach Cts MDP Richer LSTM forecasts csandeep@stanford.edu 22
Recommend
More recommend