Reinforcement Learning Shaswot Shresthamali, Masaaki Kondo, Hiroshi - PowerPoint PPT Presentation

Adaptive Power Management for Energy Harvesting Sensor Nodes using Reinforcement Learning Shaswot Shresthamali, Masaaki Kondo, Hiroshi Nakamura The University of Tokyo

CONTEXT

Energy Harvesting Sensor Nodes Sensor Node (capable of varying the duty cycle) + Battery + http://www.libelium.com/resources/ima ges/content/products/plug- sense/details/solar_powered_photo.png Energy Harvesting Module Theoretically capable of perpetual operation 3 8 March 2017 NAKAMURA LABORATORY

Challenge I Say your battery is at 75% and there is plenty of sunshine Do you ◦ Use the solar power to charge your battery only ◦ Use the solar power to charge your battery and drive the sensor node. If so, then with what proportion? 20000 800 ENERGY HARVESTED 18000 700 16000 600 14000 BATTERY 500 12000 Policy 1 10000 400 Policy 2 8000 300 Policy 3 6000 200 Energy Harvested 4000 100 2000 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 TIME 4 8 March 2017 NAKAMURA LABORATORY

Challenge II Environmental Sensor Networks – P.I. Corke et. al. MOVING SENSORS https://sites.google.com/site/sarmavrudhula/home/research/energy-management-of-wireless-sensor- networks DIFFERENT ENVIRONMENTS DIFFERENT SENSORS http://www.mdpi.com/sensors/sensors-12-02175/article_deploy/html/images/sensors-12- 02175f5-1024.png 5 8 March 2017 NAKAMURA LABORATORY

Challenge III BILLION AND TRILLIONS OF NODES https://bendeetech.files.wordpress.com/2015/12/theinternetofthings2-540x334.jpg?w=350&h=200&crop=1 6 8 March 2017 NAKAMURA LABORATORY

Challenges II and III When dealing with TRILLIONS of sensor nodes, Customizing each node is impractical, impossible ◦ Nodes should OPTIMIZE themselves. ◦ Nodes should ADAPT to their changing environments. ENERGY HARVESTING NODES NEED TO BE ADAPTABLE SELF CALIBRATING 7 8 March 2017 NAKAMURA LABORATORY

What this presentation is about To demonstrate how to overcome the challenges by using Reinforcement Learning (RL) ◦ Brief introduction to Reinforcement Learning ◦ Our approach using RL ◦ How this strategy performs compares to other methods ◦ How this strategy adapts to changing environment 8 8 March 2017 NAKAMURA LABORATORY

OBJECTIVES

Objectives Energy Neutral Operation (ENO) ◦ Energy consumed = Energy harvested Maximize Performance ◦ Maximize Duty Cycle Minimize Battery Downtime ◦ Battery should never drop to zero Minimize Energy Waste ◦ Battery should not overcharge Energy Waste = Energy Harvested –Energy consumed by Node –Energy to charge battery 10 8 March 2017 NAKAMURA LABORATORY

SYSTEM MODEL

System Model Solar Energy (Ideal) Solar Sensor Battery Panel Node Duty Cycle Adaptive Power Battery Reserve Level Manager Energy being harvested 12 8 March 2017 NAKAMURA LABORATORY

REINFORCEMENT LEARNING (RL) A BRIEF INTRODUCTION

What is RL Type of Machine Learning - Learns by interacting with Environment Suited for Sequential Decision Making Tasks Map situations (states) into actions – receive as much reward as possible Based on iterative process of trial and error – similar to how humans learn. ( Search and Memory ) 14 8 March 2017 NAKAMURA LABORATORY

Why Reinforcement Learning By using RL, it is possible o To optimize nodes with raw high level data and minimal human input. o To adapt to changes in the environment parameters. 15 8 March 2017 NAKAMURA LABORATORY

Reinforcement Learning What action should I take to accumulate total maximum OBSERVATIONS: Battery Level reward? Energy Harvested REWARD, New State ACTION: Choose Duty Cycle Environment Agent http://www.canstockphoto.com/go- (Power Manager) green-icons-concept-tree-12796260.html http://wedreamabout.com/product/bb-8-droid- the-coolest-star-wars-toy-ever 16 8 March 2017 NAKAMURA LABORATORY

Reinforcement Learning The question is: WHICH ACTION TO TAKE WHEN YOU ARE IN A GIVEN STATE? EXAMPLE: Lots of sunlight | Battery at 60% Do you [1] [2]  drive the sensor node at full strength without recharging?  drive the sensor node at half strength with partial charging? [1] https://spaceplace.nasa.gov/sun-corona/en/ [2] https://handyenergy.ru/ 17 8 March 2017 NAKAMURA LABORATORY

Q- Value Assign every state- action pair → Q -Value, Q(s,a) Q(s,a) means if the agent • Starts from state s • Takes action a • Q(s,a) is the total reward it can expect in the best case scenario Higher the Q-value, better the action for State X that particular state Action 3 Action 1 Action 2 Q(X,3) Q(X,1) Q(X,2) 18 8 March 2017 NAKAMURA LABORATORY

Q-Learning Challenge → Determining the Q -Values for all state- action pairs. Q-table -> contains Q-Values of all possible state- action pairs Accomplished by Q-Learning Algorithm ◦ Q-values are learned by interacting with environment. ◦ Iterative Process ◦ Bootstrapping approach 19 8 March 2017 NAKAMURA LABORATORY

Q-Learning Algorithm Q-Learning Algorithm ◦ Use arbitrary estimates for Q-values ◦ Use these estimates to decide on actions ◦ Update Q-table by using the rewards received ◦ Repeat until Q-value sufficiently converge 20 8 March 2017 NAKAMURA LABORATORY

EXPERIMENTS ON ADAPTIVE POWER MANAGEMENT USING Q-LEARNING

State Space State is defined by : • amount of battery remaining • 200 possible levels • amount of energy harvested • 5 possible levels Total possible states: 200 x 5 = 1000 22 8 March 2017 NAKAMURA LABORATORY

Action Space Action: Choose duty cycle of the sensor node 𝐵 = 𝑏 𝑢 𝑙 ∈ 10%, 20%, 30% … .100% 10% 50 mW 50% 250 mW 100% 500 mW 23 8 March 2017 NAKAMURA LABORATORY

Reward Function The reward depends on: • Distance from energy neutrality at time t k ∆𝑓 𝑜𝑓𝑣𝑢𝑠𝑏𝑚 𝑢 𝑙 = 𝑓 ℎ𝑏𝑠𝑤𝑓𝑡𝑢 𝑢 𝑙 − 𝑓 𝑜𝑝𝑒𝑓 𝑢 𝑙 • Amount of battery remaining 24 8 March 2017 NAKAMURA LABORATORY

RESULTS

Training and Testing Training: Tokyo (2000 to 2009) Testing : Tokyo (2010) ◦ Compare it with other methods. ◦ Adaptation to diurnal and seasonal variations. ◦ Greedy and ε -greedy Implementations 26 8 March 2017 NAKAMURA LABORATORY

Comparison with other methods 100% Higher Efficiency 90% 𝐵𝑑𝑢𝑣𝑏𝑚 𝐸𝑣𝑢𝑧 𝐷𝑧𝑑𝑚𝑓 80% 70% 𝐵𝑑ℎ𝑗𝑓𝑤𝑏𝑐𝑚𝑓 𝑁𝑏𝑦𝑗𝑛𝑣𝑛 𝐸𝑣𝑢𝑧 𝐷𝑧𝑑𝑚𝑓 60% 50% Efficiency(%) 40% Energy Wasted(%) 30% 20% 10% 𝑈𝑝𝑢𝑏𝑚 𝐹𝑜𝑓𝑠𝑕𝑧 𝑋𝑏𝑡𝑢𝑓𝑒 Lower Waste 0% 𝑈𝑝𝑢𝑏𝑚 𝐹𝑜𝑓𝑠𝑕𝑧 𝐼𝑏𝑠𝑤𝑓𝑡𝑢𝑓𝑒 Naïve Kansal Our method using RL 𝐹𝑜𝑓𝑠𝑕𝑧 𝑋𝑏𝑡𝑢𝑓 = 𝐹𝑜𝑓𝑠𝑕𝑧 𝐼𝑏𝑠𝑤𝑓𝑡𝑢𝑓𝑒 Fix duty cycle for Duty Cycle is −𝑂𝑝𝑒𝑓 𝐹𝑜𝑓𝑠𝑕𝑧 present day by proportional to − 𝐷ℎ𝑏𝑠𝑕𝑗𝑜𝑕 𝐹𝑜𝑓𝑠𝑕𝑧 predicting total battery level energy for next day 27 8 March 2017 NAKAMURA LABORATORY

ADAPTATION TO SEASONAL CHANGES 28 8 March 2017 NAKAMURA LABORATORY

Performance in Summer High Duty Cycle even during the night 29 8 March 2017 NAKAMURA LABORATORY

Performance in Winter Lower duty cycle during the night Duty Cycle (%) 100% Harvested Energy (%) 80% 60% 40% 20% 0% 1360 1400 1440 1480 EPOCH Battery (%) 30 8 March 2017 NAKAMURA LABORATORY

ADAPTATION TO CHANGE IN LOCATION 31 8 March 2017 NAKAMURA LABORATORY

Implementation: ε -greedy approach Perfect Q-convergence takes too long. Instead, use ε -greedy approach with non-converged Q-table. ε -greedy approach: ◦ Take the best action by default. ◦ Take a random action with probability ε. ◦ Increasing ε → Exploration ◦ Decreasing ε → Exploitation 32 8 March 2017 NAKAMURA LABORATORY

Adaptation to change in climate • Wakkanai (very little sunshine) • Compare between • a greedy approach (Offline) and • an  -greedy approach (Online). • Training: 2000-2009 Tokyo • Testing: 2010 Wakkanai 33 8 March 2017 NAKAMURA LABORATORY

Adaptation to change in location Total number of times the 65 battery was completely 64 Average Duty Cycle (%) exhausted 63 Greedy (Non adaptive) 62 61 (8) 60 59 (14) (14) 58  -greedy (adaptive) 57 With  -greedy implementation, the agent adapts to the 56 environment and minimizes instances of battery exhaustion. 55 2010 2011 2012 2013 2014 2015 Wakkanai Offline Wakkanai Online 34 8 March 2017 NAKAMURA LABORATORY

With and Without Forecast Information 35 8 March 2017 NAKAMURA LABORATORY

CONCLUSION

CONCLUSION • Proposed system is able to meet objectives of • Energy neutrality • Maximizing performance • Exceeds the performance of other schemes • Capable of adaptation • Inclusion of weather forecast results in smarter operation 37 8 March 2017 NAKAMURA LABORATORY

THANK YOU FOR LISTENING ANY COMMENTS OR QUESTIONS ARE WELCOME

Reinforcement Learning Shaswot Shresthamali, Masaaki Kondo, Hiroshi - PowerPoint PPT Presentation

Adaptive Power Management for Energy Harvesting Sensor Nodes using Reinforcement Learning Shaswot Shresthamali, Masaaki Kondo, Hiroshi Nakamura The University of Tokyo CONTEXT Energy Harvesting Sensor Nodes Sensor Node (capable of varying

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Introduction CSCE CSCE 496/896 496/896 Lecture 7: Lecture 7: Reinforcement Reinforcement

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 11: Hierarchical Reinforcement

Machine Learning for NLP Reinforcement learning Aurlie Herbelot 2019 Centre for Mind/Brain

Feedback Capacity of Finite-State Channels with Causal State Information Known at the Encoder Eli

Optimal piezoelectric energy harvesting strategy Joint work with B. Kaltenbacher Pavel Krej

Dependency Parser for Bengali-English Code-Mixed Data enhanced with a Synthetic Treebank Urmi

Update on Federal Policies that Impact Home and Community Based Services Alison Barkoff, J.D.

Competitive Neutrality Comments, Session 2 ACCC Regulation and Competition Conference, July 25

GLOBAL INNOVATION Customer Centered Innovation October 3, 2013 1 Contains Confidential and/or

2Q FY2017 Results Briefing Pos Malaysia Berhad 5 December 2016 Disclaimer Sections of this

Bath Clean Air Plan update Climate Emergency and Sustainability Scrutiny Panel 13.01.20