15-780 - graduate artificial intelligence ai and education iii . - PowerPoint PPT Presentation

15-780 - graduate artificial intelligence ai and education iii . Shayan Doroudi May 1, 2017 1

series overview Lecture Application AI Topics 5/01/17 Instruction Multi-Armed Bandits Series on applications of AI to education. 4/24/17 Learning Machine Learning + Search 4/26/17 Assessment Machine Learning + Mechanism Design 2

prediction vs. intervention Prediction Intervention • Predicting performance in a learning environment • Predicting performance on a test 3

prediction vs. intervention Prediction Intervention • Predicting performance in a • Changing instruction based on learning environment refined cognitive model • Predicting performance on a test 3

prediction vs. intervention Prediction Intervention • Predicting performance in a • Changing instruction based on learning environment refined cognitive model • Predicting performance on a • Computerized Adaptive Testing test 3

prediction vs. intervention Prediction Intervention • Predicting performance in a • Changing instruction based on learning environment refined cognitive model • Predicting performance on a • Computerized Adaptive Testing test • Choosing the best instruction 3

• After each decision, we know if each expert got it right or wrong. • Multi-Armed Bandits: Choose only one arm (expert/action); only know if that arm was good or bad. randomized weighted majority and bandits • Recall the Randomized Weighted Majority Algorithm. 4

• Multi-Armed Bandits: Choose only one arm (expert/action); only know if that arm was good or bad. randomized weighted majority and bandits • Recall the Randomized Weighted Majority Algorithm. • After each decision, we know if each expert got it right or wrong. 4

randomized weighted majority and bandits • Recall the Randomized Weighted Majority Algorithm. • After each decision, we know if each expert got it right or wrong. • Multi-Armed Bandits: Choose only one arm (expert/action); only know if that arm was good or bad. 4

• At each time step t , we choose one action a t . • Observe reward for that action, coming from some unknown distribution with mean a . • Want to minimize regret: T R T T max a a t a t 1 multi-armed bandits • Set of K actions A = { a 1 , . . . , a K } . 5

• Observe reward for that action, coming from some unknown distribution with mean a . • Want to minimize regret: T R T T max a a t a t 1 multi-armed bandits • Set of K actions A = { a 1 , . . . , a K } . • At each time step t , we choose one action a t ∈ A . 5

• Want to minimize regret: T R T T max a a t a t 1 multi-armed bandits • Set of K actions A = { a 1 , . . . , a K } . • At each time step t , we choose one action a t ∈ A . • Observe reward for that action, coming from some unknown distribution with mean µ a . 5

multi-armed bandits • Set of K actions A = { a 1 , . . . , a K } . • At each time step t , we choose one action a t ∈ A . • Observe reward for that action, coming from some unknown distribution with mean µ a . • Want to minimize regret: [ T ] R ( T ) = T max ∑ a ∈A µ a − E µ a t t = 1 5

poll (multi-armed bandits) 0 . 9 Average Reward 0 . 8 0 . 1 . . . . . . . . . . 1 2 3 Action Suppose action 1 was taken 20 times, action 2 was taken 10 times, and action 3 was taken once. Which action should we take next? • Action 1 • Action 2 • Action 3 • Some distribution over the actions. 6

exploration vs. exploitation • Exploration : Trying different actions to discover what's good. • Exploitation : Doing (exploiting) what we believe to be best. 7

explore-then-commit • Explore-then-Commit: Take each action n times, then commit to the action with the best sample average reward. 8

upper confidence bound (ucb) .

After taking action 3 two more times and seeing 0.1 both times: Average Reward 3 2 . . . 1 . . . 0 . . 1 2 3 . . . . . . . . . . . . . . . . Action . optimism in the face of uncertainty Average Reward 3 . 2 . . 1 . 0 . . 1 2 3 Action . . . . . . . . . . . . . . . . . 9

optimism in the face of uncertainty Average Reward 3 . 2 . . 1 . 0 . . 1 2 3 Action . . . . . . . . . . . . . . . . . After taking action 3 two more times and seeing 0.1 both times: Average Reward 3 2 . . . 1 . 0 . . 1 2 3 . . . . . . . . . . . . . . . . Action . 9

ucb1 UCB1 Algorithm: 1. Take each action once. 2. Take action n j √ 1 2 ln ( n ) ∑ arg max r j , i + n j n j a j ∈A i = 1 • n is the total number of actions taken so far • n j is the number of times we took a j • r j , i is the reward from the i th time we took a j 10

thompson sampling .

• Take action a j with probability r a j max r a P d a • Can just sample according to P , and take max a r a thompson sampling Thompson Sampling Algorithm: Choose actions according to the probability that we think they are best. 11

• Can just sample according to P , and take max a r a thompson sampling Thompson Sampling Algorithm: Choose actions according to the probability that we think they are best. • Take action a j with probability ∫ r | a j , θ = max a ∈A E [ r | a , θ ]) P ( θ |D ) d θ [ ] I ( E 11

thompson sampling Thompson Sampling Algorithm: Choose actions according to the probability that we think they are best. • Take action a j with probability ∫ r | a j , θ = max a ∈A E [ r | a , θ ]) P ( θ |D ) d θ [ ] I ( E • Can just sample θ according to P ( θ |D ) , and take max a ∈A E [ r | a , θ ] 11

r j 1 r j P p j r j p 1 p j j • Use Conjugate Prior (Beta Distribution): P p j p j 1 p j • After we take a j , if we see reward r j , r j 1 r j P p j r j P p j P r j p j p j 1 p j p j 1 p j • After any action the posterior distribution will be as follows: s j f j P p j p 1 p j j thompson sampling with beta prior • Suppose each action a j gives rewards according to a Bernoulli distribution with some unknown probability p j . 12

r j 1 r j P p j r j p 1 p j j • After we take a j , if we see reward r j , r j 1 r j P p j r j P p j P r j p j p j 1 p j p j 1 p j • After any action the posterior distribution will be as follows: s j f j P p j p 1 p j j thompson sampling with beta prior • Suppose each action a j gives rewards according to a Bernoulli distribution with some unknown probability p j . • Use Conjugate Prior (Beta Distribution): P ( p j | α, β ) ∝ p α j ( 1 − p j ) β 12

r j 1 r j P p j r j p 1 p j j • After any action the posterior distribution will be as follows: s j f j P p j p 1 p j j thompson sampling with beta prior • Suppose each action a j gives rewards according to a Bernoulli distribution with some unknown probability p j . • Use Conjugate Prior (Beta Distribution): P ( p j | α, β ) ∝ p α j ( 1 − p j ) β • After we take a j , if we see reward r j , r j j ( 1 − p j ) 1 − r j P ( p j | α, β, r j ) ∝ P ( p j | α, β ) P ( r j | p j ) ∝ p α j ( 1 − p j ) β p 12

• After any action the posterior distribution will be as follows: s j f j P p j p 1 p j j thompson sampling with beta prior • Suppose each action a j gives rewards according to a Bernoulli distribution with some unknown probability p j . • Use Conjugate Prior (Beta Distribution): P ( p j | α, β ) ∝ p α j ( 1 − p j ) β • After we take a j , if we see reward r j , r j j ( 1 − p j ) 1 − r j P ( p j | α, β, r j ) ∝ P ( p j | α, β ) P ( r j | p j ) ∝ p α j ( 1 − p j ) β p α + r j ( 1 − p j ) β + 1 − r j P ( p j | α, β, r j ) ∝ p j 12

thompson sampling with beta prior • Suppose each action a j gives rewards according to a Bernoulli distribution with some unknown probability p j . • Use Conjugate Prior (Beta Distribution): P ( p j | α, β ) ∝ p α j ( 1 − p j ) β • After we take a j , if we see reward r j , r j j ( 1 − p j ) 1 − r j P ( p j | α, β, r j ) ∝ P ( p j | α, β ) P ( r j | p j ) ∝ p α j ( 1 − p j ) β p α + r j ( 1 − p j ) β + 1 − r j P ( p j | α, β, r j ) ∝ p j • After any action the posterior distribution will be as follows: α + s j ( 1 − p j ) β + f j P ( p j |D ) ∝ p j 12

thompson sampling with beta prior Thompson Sampling Algorithm with Bernoulli Actions and Beta Prior: • Sample p 1 , . . . , p K with probability α + s j ( 1 − p j ) β + f j P ( p j |D ) ∝ p j • Choose arg max a j ∈A E r | p j = p j [ ] 13

poll (thompson sampling) How can we increase exploration using Thompson Sampling with Beta Prior? • Choose a large α • Choose a large β • Choose an equally large α and β • Beats me 14

example: axis 15

example: axis 16

example: axis 17

example: axis 18

What's missing? 19

contextual bandits .

• Solve for a using linear regression, build confidence intervals over the mean, and apply UCB. linucb • Obtain some context x t , a • Assume linear payoff function: E [ r t , a | x t , a ] = x T t θ a 20

15-780 - graduate artificial intelligence ai and education iii . - PowerPoint PPT Presentation

15-780 - graduate artificial intelligence ai and education iii . Shayan Doroudi May 1, 2017 1 series overview Lecture Application AI Topics 5/01/17 Instruction Multi-Armed Bandits Series on applications of AI to education. 4/24/17

15-780 - graduate artificial intelligence ai and education i . Shayan Doroudi April 24, 2017 1

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

What is Artificial Intelligence? . . . Exactly what the computer provides is the ability not to be

Embedding Artificial Intelligence in Indonesia Education Head, Agency of Research and Development

AI Artificial Intelligence Definition artificial intelligence / rd

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

1.1 What is AI? 1. What is Artificial Intelligence? 2. AI Past and Present 3. Rational

15-780 Graduate Artificial Intelligence: Adversarial attacks and provable defenses J. Zico

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

CSCI 446: Artificial Intelligence CSCI 446: Artificial Intelligence Course Website:

Introduction to Artificial Intelligence What is Artificial Intelligence for YOU? CPSC 533

Traditional Definition of Artificial Intelligence Trends Artificial Intelligence (AI) is

15-780 Graduate Artificial Intelligence: Probabilistic modeling J. Zico Kolter (this lecture)

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

15-780 Graduate Artificial Intelligence: Optimization J. Zico Kolter (this lecture) and Ariel

15-780 Graduate Artificial Intelligence: Probabilistic inference J. Zico Kolter (this

15-780 Graduate Artificial Intelligence: Integer programming J. Zico Kolter (this lecture)

15-780 Graduate Artificial Intelligence: Integer programming J. Zico Kolter (this lecture)

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

Artificial Intelligence as Law Bart Verheij Department of Artificial Intelligence, Bernoulli

Embodied Machines Artificial vs. Embodied Intelligence Artificial Intelligence (AI)

Graduate Student Orientation Vasant Honavar Artificial Intelligence Research Laboratory

15-780 Graduate Artificial Intelligence: Machine learning J. Zico Kolter (this lecture) and