a contextual bandit approach to personalized news article
play

A Contextual-Bandit Approach to Personalized News Article - PowerPoint PPT Presentation

A Contextual-Bandit Approach to Personalized News Article Recommendation Lihong li, Wei Chu, John Langford, Rebort E. Schapire Presentator: Qingyun Wu News Recommendation Cycle A K-armed Bandit Formulation A gambler must decide which of


  1. A Contextual-Bandit Approach to Personalized News Article Recommendation Lihong li, Wei Chu, John Langford, Rebort E. Schapire Presentator: Qingyun Wu

  2. News Recommendation Cycle

  3. A K-armed Bandit Formulation • A gambler must decide which of the K non-identical slot machines(we called them arms) to play in a sequence of trails in order to maximize total reward. News Website <—> gambler Candidate news articles <—> arms User Click <—> Reward How to pull arms to maximize reward?

  4. A K-armed Bandit formulation Setting • Set of K choices(arms) a • Each choice is associate with an unknown probability p a distribution supported in [0,1] • play the game for rounds T t • In each round (1)we pick article j p j (2)we observe random sample from X t T ∑ Our Goal: maximize X t t = 1

  5. Ideal Solution µ a argmax Pick a But we DO NOT know the mean.

  6. Feasible Solution Every time we pull an arm we learn a bit more about the distribution.

  7. Exploitation VS. Exploration Exploitation: pull an arm Exploration: Pull an arm for which we current have we never pulled before the highest estimate of mean of reward Extreme examples: Greedy Strategy: Random Strategy: Take the arm with Randomly choose the highest average an arm reward Too confident Too unconfident

  8. How to make trade off Exploitation Exploration Don’t just look at the mean(that’s the expected reward), but also the confidence!

  9. UCB(Upper Confidence Bound) algorithm ^ µ a + α * Varance ) Pick argmax( a Confidence Interval is a range of values ^ within which we are µ a + α * UCB ) Pick argmax( sure the mean lies a with a certain probability 2ln T ^ a ( µ a + UCB1 argmax ) n a Reference: Finite-Analysis of the Multi-armed Bandit Problem, Peter Auer, Nicolo Cesa-Bianchi, Paul Fischer http://homes.di.unimi.it/~cesabian/Pubblicazioni/ml-02.pdf

  10. Make use of Contextual Information User feature: demographic information, geographic features, behavioral categories Article feature :URL categories, topic categories Assumption about the reward: a d The expected reward of an arm is linear in its -dimensional θ a x t , a feature , with some unknown coefficient vector , * t namely, for all , T θ a t , a | x t , a ) = x t , a * E ( r

  11. UCB(Upper Confidence Bound) algorithm T θ a t , a | x t , a ) = x t , a * E ( r Assumption ˆ T D a + I d ) − 1 D a θ a = ( D a T c a Parameter Estimation (Ridge Regression) T ˆ T ( D a θ a − E ( r t , a | x t , a ) ≤ α T D a + I d ) − 1 x t , a x t , a x t , a Bound of the variance Bound we need!!! T ˆ T ( D a T D a + I d ) x t , a ) θ a + α argmax a ( x t , a x t , a Pick

  12. Performance Evaluation

  13. Summary Model news recommendation as a K-armed Bandit Problem UCB-type Algorithm Take Contextual Information in to consideration

  14. Q&A

Recommend


More recommend