Imp mpro roving Search rch Thro rough Effici cient A/ A/B Test sting: : A A Ca Case se Study Nokia Maps “Place Discovery” Team, Berlin: Hannes Kruppa, Steffen Bickel, Mark Waldaukat, Felix Weigel, Ross Turner, Peter Siemen
Nokia Maps s for r Everyo ryone!
Nokia Maps s Team, m, Berli rlin
Nokia Maps: s: Nearb rby y Pla lace ces s “Disco scover r Pla lace ces s You Will ll Love, , An Anywh ywhere re”
Nokia Maps: s: Nearb rby y Pla lace ces s “Disco scover r Pla lace ces s You Will ll Love, , An Anywh ywhere re” Easily discover places nearby with a tap wherever you are. View them in the map or in a list view.
Nokia Maps: s: Nearb rby y Pla lace ces s “Disco scover r Pla lace ces s You Will ll Love, , An Anywh ywhere re” Easily discover places nearby with a tap wherever you are. View them in the map or in a list view. Tap on a list item to see detail information.
Nokia Maps: s: Nearb rby y Pla lace ces s “Disco scover r Pla lace ces s You Will ll Love, , An Anywh ywhere re” Possible user actions: Easily discover places • SaveAsFavorite nearby with a tap • CallThePlace wherever you are. View • DriveTo them in the map or in a • … list view. Tap on a list item to see detail information.
Pro roble lem: m: Which ch Pla lace ces s to Show? w? • Restaurants? Hotels? Shopping? … • rank by Ratings? • Distance? • Usage? • Trending? • ….
Ap Appro roach ch: : A/ A/B-Test st Differe rent Ve Versi rsions! s! Here is classical Web A/B testing:
A/ A/B-Test st for for Nearb rby Pla lace ces Version A: Version B: Best of Eat’n’Drink Best of Hotels Versions Compete for User engagement: = Number of Actions performed on places.
There re Is s A A Better r Ap Appro roach ch For r Ranked List sts s [Joachims et al 2008]: “How Does Clickthrough Data Reflect Retrieval Quality?” • Classical A/B testing converges slowly for ranked lists • Classical A/B testing often doesn’t reflect actual relevance • A/B Tests for Ranked Result Lists: Rank- Interleaving • Use Rank-Interleaving for faster statistical significance
Effici cient A/ A/B Test sting: : Rank Interle rleaving Version A: Version B: Best of Eat’n’Drink Best of Hotels
Effici cient A/ A/B Test sting: : Rank Interle rleaving Rank Interleaving: Version A: Version B: Version A + B Best of Eat’n’Drink Best of Hotels + =
Randomi mize zed Mixing of Resu sult lt List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list <empty> Version A Version B 1. alpha 1. beta 2. beta 2. kappa 3. gamma 3. tau 4. delta 5. epsilon
A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list 1. alpha (from A) Version A Version B 2. beta (from B) 1. alpha 1. beta 2. beta 2. Result f 3. gamma 3. Result g 4. delta 5. epsilon
A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list 1. alpha (from A) Version A Version B 2. beta (from B) 1. alpha 1. beta 2. (beta) 2. kappa 3. gamma 3. tau Duplicates below current 4. delta item are removed 5. epsilon
A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list 1. alpha (from A) Version A Version B 2. beta (from B) 1. alpha 1. beta 2. (beta) 2. kappa 3. gamma 3. tau 4. delta 3. gamma (from A) 5. epsilon 4. kappa (from B)
A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list 1. alpha (from A) Version A Version B 2. beta (from B) 1. alpha 1. beta 2. (beta) 2. kappa 3. gamma 3. tau 4. delta 3. gamma (from A) 5. epsilon 4. kappa (from B) 5. tau (from B) 6. delta (from A)
A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list 1. alpha (from A) Version A Version B 2. beta (from B) 1. alpha 1. beta 2. (beta) 2. kappa 3. gamma 3. tau 4. delta 3. gamma (from A) 5. epsilon 4. kappa (from B) 5. tau (from B) 6. delta (from A) Leftover results are appended but clicks 7. epsilon (from A, extra) are not counted
A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list Final list shown to user 1. alpha (from A) Version A Version B 2. beta (from B) 1. alpha 1. beta 2. (beta) 2. kappa 3. gamma 3. tau 4. delta 3. gamma (from A) 5. epsilon 4. kappa (from B) 5. tau (from B) 6. delta (from A) 7. epsilon (from A, extra)
Decla clari ring A A Winner r • Statistical Significance Test • Input (after hadoop-based log-processing...) • Number of clicks on version A • Number of clicks on version B • G-Test: • improved version of Pearson's Chi-squared test. • G > 6.635 corresponds to 99% confidence level • Null hypothesis: • Frequency of counts is equally distributed over both versions. • Test statistic: ! $ [counts i] ( G = 2 [counts i] ln # & [total counts/2] " % i ' { A , B }
Managing Mult ltiple le Ve Versi rsions s RPC Interaction Search API Servlet Container Users Area Zookeeper Federation/Ranking Spelling Discovery Place Address Data Frontend Batch (REST API) updates for recovery ... SOLR SOLR instance-1 instance-2 QA / Indexing Data Core Core Core Core Core Core Cluster providers Type 1 Type 2 Type 3 Type 2 Type 4 Type 1 replication
Managing Mult ltiple le Ve Versi rsions s RPC Interaction • Every incoming query is replicated and routed to Search API Servlet Container Users Area Zookeeper Versions A and B Federation/Ranking Spelling • Each Version is implemented as a specific type of Discovery Place Address SOLR query Data Frontend • We deploy more than 2 versions to production and Batch (REST API) updates for switch between them using zookeeper recovery ... SOLR SOLR • Result-mixing of A and B is implemented in a instance-1 instance-2 processing layer above SOLR QA / Indexing Data Core Core Core Core Core Core Cluster providers Type 1 Type 2 Type 3 Type 2 Type 4 Type 1 replication
Caveat 1: Ca : Randomi miza zation • don’t confuse users with changing results, i.e.: provide a consistent user experience • Solution: • Random generator is seeded with USER-ID for each query. • Each user gets his personal random generator.
Ca Caveat 2: : Healt lthy y Cli Click ck Data • we are relying on the integrity of transmitted user actions • sensitive to log contamination (unidentified QA, spam) • user-clicks plot:
Ca Caveat 3: : A/ A/B Cli Click cks s vs. s. Co Covera rage • Coverage = non-empty responses (in percent) • For example • A/B interleaving of eat&drink vs. eat&drink + going out • difference is not significant • But coverage different, percentage of responses with POIs nearby: • 60% eat&drink • 62% eat&drink + going out • Higher coverage decides in case there is no statistical difference
Ca Case se Study: y: Eat’n’Dri rink versu rsus s Hotels: ls: Not the Use ser r Behaviour Behaviour we we had expect cted! Rate Save (Fav’s) Contact: Call Contact: URL Share Navigate: Drive Navigate: Walk Navigage: Add Info Provider 0 375 750 1125 1500
Ca Case se Study: y: versu rsus s : : Not the Use ser r Behaviour Behaviour we we had expect cted! Rate Save (Fav’s) Contact: Call Contact: URL Share Some users select their driving Navigate: Drive destination with the help of Nearby Places. Hotels are a Navigate: Walk common destination in the car navigation use case. Navigage: Add Info Provider 0 375 750 1125 1500
Summa mmary ry • use A/B Rank Interleaving to optimize result relevance • Rank Interleaving is easy to implement. Works. • in a distributed search architecture manage your A/B test configurations conveniently using Zookeeper • harness your hadoop/search analytics stack for A/B test evaluations • don’t make assumptions about your users! • [Joachims et al 2008]: “How Does Clickthrough Data Reflect Retrieval Quality?”
Thanks! s! Get in touch ch: : hannes. s.kru ruppa@nokia.co com Nokia Maps “Place Discovery” Team, Berlin: Hannes Kruppa, Steffen Bickel, Mark Waldaukat, Felix Weigel, Ross Turner, Peter Siemen
Recommend
More recommend