Learning how to Active Learn: A Deep Reinforcement Learning Approach - PowerPoint PPT Presentation

Learning how to Active Learn: A Deep Reinforcement Learning Approach Meng Fang, Yuan Li, Trevor Cohn The University of Melbourne Presenter: Jialin Song April 05, 2018 April 05, 2018 1 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Overview Introduction 1 Model 2 Algorithms 3 Numerical Experiments 4 April 05, 2018 2 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Introduction: Active Learning 1 Annotation: April 05, 2018 3 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Introduction: Active Learning 1 Annotation: ⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) April 05, 2018 3 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Introduction: Active Learning 1 Annotation: ⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) ⋄ then we can train a supervised learning model φ (classifier) April 05, 2018 3 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Introduction: Active Learning 1 Annotation: ⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) ⋄ then we can train a supervised learning model φ (classifier) ⋄ we hope to maximize the accuracy of the classification model April 05, 2018 3 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Introduction: Active Learning 1 Annotation: ⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) ⋄ then we can train a supervised learning model φ (classifier) ⋄ we hope to maximize the accuracy of the classification model 2 Active learning: April 05, 2018 3 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Introduction: Active Learning 1 Annotation: ⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) ⋄ then we can train a supervised learning model φ (classifier) ⋄ we hope to maximize the accuracy of the classification model 2 Active learning: ⋄ there is high cost annotating every sentence April 05, 2018 3 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Introduction: Active Learning 1 Annotation: ⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) ⋄ then we can train a supervised learning model φ (classifier) ⋄ we hope to maximize the accuracy of the classification model 2 Active learning: ⋄ there is high cost annotating every sentence ⋄ how to select raw data to add labels in order to maximize the accuracy of the classification model April 05, 2018 3 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Introduction: Active Learning 1 Annotation: ⋄ select a subset of data to annotate from a large unlabelled dataset (adding labels) ⋄ then we can train a supervised learning model φ (classifier) ⋄ we hope to maximize the accuracy of the classification model 2 Active learning: ⋄ there is high cost annotating every sentence ⋄ how to select raw data to add labels in order to maximize the accuracy of the classification model ⋄ active learning becomes a sequential decision: as each sentence arrives, annotate it or not (our action) April 05, 2018 3 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Introduction: MDP 1 Markov Decision Process (MDP): April 05, 2018 4 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Introduction: MDP 1 Markov Decision Process (MDP): ⋄ a framework to model a sequential decision process April 05, 2018 4 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Introduction: MDP 1 Markov Decision Process (MDP): ⋄ a framework to model a sequential decision process ⋄ in each decision stage, agent observes state variables ( s ) and take a action ( a ) to maximize its current payoff April 05, 2018 4 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Introduction: MDP 1 Markov Decision Process (MDP): ⋄ a framework to model a sequential decision process ⋄ in each decision stage, agent observes state variables ( s ) and take a action ( a ) to maximize its current payoff ⋄ after taking the action, a reward associated with the action and state ( r ( s, a ) ) is generated and current state transits to next state April 05, 2018 4 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Introduction: MDP 1 Markov Decision Process (MDP): ⋄ a framework to model a sequential decision process ⋄ in each decision stage, agent observes state variables ( s ) and take a action ( a ) to maximize its current payoff ⋄ after taking the action, a reward associated with the action and state ( r ( s, a ) ) is generated and current state transits to next state ⋄ agent aims maximizing the expected value of rewards over all stages April 05, 2018 4 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Introduction: Bellman Equation 1 The dynamics of MDP can be modeled in Bellman equations April 05, 2018 5 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Introduction: Bellman Equation 1 The dynamics of MDP can be modeled in Bellman equations ⋄ Bellman equation 1: value function � � � P ss ′ ( a ) J ( s ′ ) J ( s ) = max r ( s, a ) + α ¯ a s ′ a ∗ s = argmax J ( s ) April 05, 2018 5 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Introduction: Bellman Equation 1 The dynamics of MDP can be modeled in Bellman equations ⋄ Bellman equation 1: value function � � � P ss ′ ( a ) J ( s ′ ) J ( s ) = max r ( s, a ) + α ¯ a s ′ a ∗ s = argmax J ( s ) ⋄ Bellman equation 2 (more common!): Q-function � Q ( s ′ , u ) Q ( s, a ) = ¯ r ( s, a ) + α P ss ′ ( a ) max u s ′ a ∗ s = argmax Q ( s, a ) April 05, 2018 5 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Introduction: Bellman Equation 1 The dynamics of MDP can be modeled in Bellman equations ⋄ Bellman equation 1: value function � � � P ss ′ ( a ) J ( s ′ ) J ( s ) = max r ( s, a ) + α ¯ a s ′ a ∗ s = argmax J ( s ) ⋄ Bellman equation 2 (more common!): Q-function � Q ( s ′ , u ) Q ( s, a ) = ¯ r ( s, a ) + α P ss ′ ( a ) max u s ′ a ∗ s = argmax Q ( s, a ) ⋄ where ¯ r ( s, a ) is the expected reward, P ss ′ ( a ) is the transition probability from state s to s ′ , α is the discount of reward April 05, 2018 5 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Q-Learning 1 If P ss ′ ( a ) is known, then solve the Bellmen equations (VI/PI) to get the optimal policy. There is no need to ’learn’!!! April 05, 2018 6 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Q-Learning 1 If P ss ′ ( a ) is known, then solve the Bellmen equations (VI/PI) to get the optimal policy. There is no need to ’learn’!!! 2 If P ss ′ ( a ) is not known, then how to compute Q-function becomes a learning problem April 05, 2018 6 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Q-Learning 1 If P ss ′ ( a ) is known, then solve the Bellmen equations (VI/PI) to get the optimal policy. There is no need to ’learn’!!! 2 If P ss ′ ( a ) is not known, then how to compute Q-function becomes a learning problem 3 Q-learning: April 05, 2018 6 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Q-Learning 1 If P ss ′ ( a ) is known, then solve the Bellmen equations (VI/PI) to get the optimal policy. There is no need to ’learn’!!! 2 If P ss ′ ( a ) is not known, then how to compute Q-function becomes a learning problem 3 Q-learning: � � ⋄ Q t +1 ( s t , a t ) = (1 − ǫ t ) Q t ( s t , a t ) + ǫ t r ( s t , a t ) + α max u Q t ( s t +1 , u ) ¯ April 05, 2018 6 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Q-Learning 1 If P ss ′ ( a ) is known, then solve the Bellmen equations (VI/PI) to get the optimal policy. There is no need to ’learn’!!! 2 If P ss ′ ( a ) is not known, then how to compute Q-function becomes a learning problem 3 Q-learning: � � ⋄ Q t +1 ( s t , a t ) = (1 − ǫ t ) Q t ( s t , a t ) + ǫ t r ( s t , a t ) + α max u Q t ( s t +1 , u ) ¯ ⋄ where t is iteration and ǫ t is the learning rate April 05, 2018 6 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Q-Learning 1 If P ss ′ ( a ) is known, then solve the Bellmen equations (VI/PI) to get the optimal policy. There is no need to ’learn’!!! 2 If P ss ′ ( a ) is not known, then how to compute Q-function becomes a learning problem 3 Q-learning: � � ⋄ Q t +1 ( s t , a t ) = (1 − ǫ t ) Q t ( s t , a t ) + ǫ t r ( s t , a t ) + α max u Q t ( s t +1 , u ) ¯ ⋄ where t is iteration and ǫ t is the learning rate ⋄ In practice, above is useless: | S | × | A | is huge April 05, 2018 6 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Deep Q-Learning 1 Deep Q-learning: April 05, 2018 7 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Deep Q-Learning 1 Deep Q-learning: ⋄ use the output of a DNN parametrized by θ , i.e., f θ ( s, u ) to approximate Q ( s, a ) : April 05, 2018 7 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine Learning in NLP

Learning how to Active Learn: A Deep Reinforcement Learning Approach - PowerPoint PPT Presentation

Learning how to Active Learn: A Deep Reinforcement Learning Approach Meng Fang, Yuan Li, Trevor Cohn The University of Melbourne Presenter: Jialin Song April 05, 2018 April 05, 2018 1 / 17 Meng Fang, Yuan Li, Trevor Cohn CS 546 Machine

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature

Agenda Intro to Active Learning Activity Design Resources for Active Learning Lunch with Active

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

You will learn what git is . You will learn how you can use git . You will learn how to learn more

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Deep Reinforcement Learning Philipp Koehn 21 April 2020 Philipp Koehn Artificial Intelligence:

Spring Framework 2.5: New and Notable Ben Alex, Principal Software Engineer, SpringSource

Gradual Typing with Inference Jeremy Siek University of Colorado at Boulder joint work with

TRECVID 2007 Collaborative Annotation using Active Learning Georges Qunot Multimedia

Models of Annotation (II) Bob Carpenter, LingPipe, Inc. Massimo Poesio, Uni. Trento LREC 2010

Annotating Expressions of Opinion and Emotion in the Italian Content Annotation Bank (I-CAB)

USING ADVENE TO USING ADVENE TO ACCOMPANY RESEARCH IN ACCOMPANY RESEARCH IN AUDIOVISUAL DIGITAL

Fine-Grained Temporal Relation Extraction Siddharth Vashishtha Benjamin Van Durme Aaron

HORAE: an annotated dataset of books of hours Mlodie Boillet, Marie-Laurence Bonhomme,