Public Policy and Deep Reinforcement Learning on AWS Emily Webber | Machine Learning Specialist at Amazon Web Services | To-be-open-sourced research project
Public Policy Has Unique Challenges Structural Inefficiency Lack of single goal Synthesize Information Leadership Turnover
What if we used machine learning to optimize public policy? Personalized Decades of Normalized Economic Collaborative Collaborative Policy Data Data Transparent Reinforcement Learning
Data-Driven Public Policy Analysis Is Not New • Causal Inference Before After • Counterfactual analysis Treatment: Illinois 100 250 • Intuitively, what would have happened if the policy (or, treatment) had not Control: New York 100 150 been applied? • Can we convince ourselves that the Did the treatment cause this difference? two groups were nearly identical otherwise? 𝑍 = 𝛾↓ 0 + 𝛾↓ 1 𝑌↓ 1 + 𝛾↓ 2 𝑌↓ 2 + 𝛾↓𝑈 𝑌↓𝑈 + … + 𝜗
Learning Theory Fundamentals Model Actions Use Use Case Case Rewards Data Machine Learning Reinforcement Learning
Mathematically speaking Available action Bellman Equation for Reinforcement Learning Current state Adjacent state, iterable Utility per state, Discount factor Transition value Recursive call or value on utility function Reward per state, For each possible a real number adjacent state
Our reward function A deep learning model maps the economic variables to a policy suggestion “Pareto” The simulator picks treatment and control states and runs a regression on historical • Ask, are they similar? T-test data • Use logical reasoning We use the estimated effect of the • Eventually, scale with another ML policy as our reward signal, scaled by validity of the experiment model using data labelled by experts Reinforcement Learning Policy Estimation Causal Inference
But how do we pick the right way to optimize?
Philosophical Foundations Egalitarianism Kantian Rights Utilitarianism Libertarianism Universal Freedom Personal Value Equality Rights Pareto Improvements Improve at least one person, without making anyone worse off
There is no single best optimization strategy What we can do is use data to automatically suggest policies based on user-defined preferences
Given your views, we What do you want What do you think recommend evaluating : to see in public policy? impacts crime the most? Outcomes Personal Freedom Crime Income In my neighborhood, Equality of outcomes people commit crimes because there Indicators Less crime Access to are no jobs here. education Employment Savings Access to social services Less waste Submit Equality of opportunity Confirm? Less traffic Better health care
These policies are Here’s how to engage Your policy impacting you today. your elected officials recommendations Bill 789 Bill 789 13.45 Reduce taxation Reducing income Bill 238 Bill 238 Continue investment 42.66 Creating jobs Please correct bill 789, it is lowering my income Bill 121 Bill 121 Build more highways Email .05 Increasing traffic
What if we could step into Would you like to see another someone else’s shoes? point of view?
Your policy Another point of view recommendations Bill 789 Bill 789 Increase taxation Reduce taxation Bill 238 Bill 238 Continue investment Continue investment Bill 121 Bill 121 Build more highways More Public Transit Personal Freedom Increase Equality Overall Increase
Technically speaking: for ism ism in philosophical_frameworks: utility = define_utility(ism ism) data = update_data(utility) model = get_pareto(data)
How should we handle air traffic delays?
Kantian Rights Utilitarianism Egalitarianism Libertarianism Do whatever increases Do what increases Uphold human rights Preserve Freedom overall utility overall equality • Don’t prioritize airline • Different people • Let people pick for • Uphold the human status travelers value timeliness themselves sanctity of travelers differently • Don’t let people pay • Don’t automatically • Provide food, lodging, more for perks • Need multiple ways make decisions for respectful notice of defining utility for travelers • Don’t do special • Make reasonable diverse stakeholders favors • Let travelers switch attempts to avoid • Use testing and across airliners delays • Treat each traveler, surveys to get a airliner, and airport • Ensure freedom of numerical estimate the same airliners and airports for how different people value certain outcomes
There is fundamental overlap between the philosophical frameworks. This overlap can be scaled by reward functions
There is no single right answer We need a computational system that can: • Synthesize different points of view • Weight these based on criteria, like population size • Be transparent, collaborative, timely • Change with the times To efficiently support existing governing bodies
Thank you! Emily Webber | Amazon Web Services | LinkedIn effective-policies@amazon.com ß email me to collaborate!
Recommend
More recommend