Imp mpro roving Search rch Thro rough Effici cient A/ A/B Test - PowerPoint PPT Presentation

Imp mpro roving Search rch Thro rough Effici cient A/ A/B Test sting: : A A Ca Case se Study Nokia Maps “Place Discovery” Team, Berlin: Hannes Kruppa, Steffen Bickel, Mark Waldaukat, Felix Weigel, Ross Turner, Peter Siemen

Nokia Maps s for r Everyo ryone!

Nokia Maps s Team, m, Berli rlin

Nokia Maps: s: Nearb rby y Pla lace ces s “Disco scover r Pla lace ces s You Will ll Love, , An Anywh ywhere re”

Nokia Maps: s: Nearb rby y Pla lace ces s “Disco scover r Pla lace ces s You Will ll Love, , An Anywh ywhere re” Easily discover places nearby with a tap wherever you are. View them in the map or in a list view.

Nokia Maps: s: Nearb rby y Pla lace ces s “Disco scover r Pla lace ces s You Will ll Love, , An Anywh ywhere re” Easily discover places nearby with a tap wherever you are. View them in the map or in a list view. Tap on a list item to see detail information.

Nokia Maps: s: Nearb rby y Pla lace ces s “Disco scover r Pla lace ces s You Will ll Love, , An Anywh ywhere re” Possible user actions: Easily discover places • SaveAsFavorite nearby with a tap • CallThePlace wherever you are. View • DriveTo them in the map or in a • … list view. Tap on a list item to see detail information.

Pro roble lem: m: Which ch Pla lace ces s to Show? w? • Restaurants? Hotels? Shopping? … • rank by Ratings? • Distance? • Usage? • Trending? • ….

Ap Appro roach ch: : A/ A/B-Test st Differe rent Ve Versi rsions! s! Here is classical Web A/B testing:

A/ A/B-Test st for for Nearb rby Pla lace ces Version A: Version B: Best of Eat’n’Drink Best of Hotels Versions Compete for User engagement: = Number of Actions performed on places.

There re Is s A A Better r Ap Appro roach ch For r Ranked List sts s [Joachims et al 2008]: “How Does Clickthrough Data Reflect Retrieval Quality?” • Classical A/B testing converges slowly for ranked lists • Classical A/B testing often doesn’t reflect actual relevance • A/B Tests for Ranked Result Lists: Rank- Interleaving • Use Rank-Interleaving for faster statistical significance

Effici cient A/ A/B Test sting: : Rank Interle rleaving Version A: Version B: Best of Eat’n’Drink Best of Hotels

Effici cient A/ A/B Test sting: : Rank Interle rleaving Rank Interleaving: Version A: Version B: Version A + B Best of Eat’n’Drink Best of Hotels + =

Randomi mize zed Mixing of Resu sult lt List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list <empty> Version A Version B 1. alpha 1. beta 2. beta 2. kappa 3. gamma 3. tau 4. delta 5. epsilon

A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list 1. alpha (from A) Version A Version B 2. beta (from B) 1. alpha 1. beta 2. beta 2. Result f 3. gamma 3. Result g 4. delta 5. epsilon

A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list 1. alpha (from A) Version A Version B 2. beta (from B) 1. alpha 1. beta 2. (beta) 2. kappa 3. gamma 3. tau Duplicates below current 4. delta item are removed 5. epsilon

A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list 1. alpha (from A) Version A Version B 2. beta (from B) 1. alpha 1. beta 2. (beta) 2. kappa 3. gamma 3. tau 4. delta 3. gamma (from A) 5. epsilon 4. kappa (from B)

A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list 1. alpha (from A) Version A Version B 2. beta (from B) 1. alpha 1. beta 2. (beta) 2. kappa 3. gamma 3. tau 4. delta 3. gamma (from A) 5. epsilon 4. kappa (from B) 5. tau (from B) 6. delta (from A)

A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list 1. alpha (from A) Version A Version B 2. beta (from B) 1. alpha 1. beta 2. (beta) 2. kappa 3. gamma 3. tau 4. delta 3. gamma (from A) 5. epsilon 4. kappa (from B) 5. tau (from B) 6. delta (from A) Leftover results are appended but clicks 7. epsilon (from A, extra) are not counted

A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list Final list shown to user 1. alpha (from A) Version A Version B 2. beta (from B) 1. alpha 1. beta 2. (beta) 2. kappa 3. gamma 3. tau 4. delta 3. gamma (from A) 5. epsilon 4. kappa (from B) 5. tau (from B) 6. delta (from A) 7. epsilon (from A, extra)

Decla clari ring A A Winner r • Statistical Significance Test • Input (after hadoop-based log-processing...) • Number of clicks on version A • Number of clicks on version B • G-Test: • improved version of Pearson's Chi-squared test. • G > 6.635 corresponds to 99% confidence level • Null hypothesis: • Frequency of counts is equally distributed over both versions. • Test statistic: ! $ [counts i] ( G = 2 [counts i] ln # & [total counts/2] " % i ' { A , B }

Managing Mult ltiple le Ve Versi rsions s RPC Interaction Search API Servlet Container Users Area Zookeeper Federation/Ranking Spelling Discovery Place Address Data Frontend Batch (REST API) updates for recovery ... SOLR SOLR instance-1 instance-2 QA / Indexing Data Core Core Core Core Core Core Cluster providers Type 1 Type 2 Type 3 Type 2 Type 4 Type 1 replication

Managing Mult ltiple le Ve Versi rsions s RPC Interaction • Every incoming query is replicated and routed to Search API Servlet Container Users Area Zookeeper Versions A and B Federation/Ranking Spelling • Each Version is implemented as a specific type of Discovery Place Address SOLR query Data Frontend • We deploy more than 2 versions to production and Batch (REST API) updates for switch between them using zookeeper recovery ... SOLR SOLR • Result-mixing of A and B is implemented in a instance-1 instance-2 processing layer above SOLR QA / Indexing Data Core Core Core Core Core Core Cluster providers Type 1 Type 2 Type 3 Type 2 Type 4 Type 1 replication

Caveat 1: Ca : Randomi miza zation • don’t confuse users with changing results, i.e.: provide a consistent user experience • Solution: • Random generator is seeded with USER-ID for each query. • Each user gets his personal random generator.

Ca Caveat 2: : Healt lthy y Cli Click ck Data • we are relying on the integrity of transmitted user actions • sensitive to log contamination (unidentified QA, spam) • user-clicks plot:

Ca Caveat 3: : A/ A/B Cli Click cks s vs. s. Co Covera rage • Coverage = non-empty responses (in percent) • For example • A/B interleaving of eat&drink vs. eat&drink + going out • difference is not significant • But coverage different, percentage of responses with POIs nearby: • 60% eat&drink • 62% eat&drink + going out • Higher coverage decides in case there is no statistical difference

Ca Case se Study: y: Eat’n’Dri rink versu rsus s Hotels: ls: Not the Use ser r Behaviour Behaviour we we had expect cted! Rate Save (Fav’s) Contact: Call Contact: URL Share Navigate: Drive Navigate: Walk Navigage: Add Info Provider 0 375 750 1125 1500

Ca Case se Study: y: versu rsus s : : Not the Use ser r Behaviour Behaviour we we had expect cted! Rate Save (Fav’s) Contact: Call Contact: URL Share Some users select their driving Navigate: Drive destination with the help of Nearby Places. Hotels are a Navigate: Walk common destination in the car navigation use case. Navigage: Add Info Provider 0 375 750 1125 1500

Summa mmary ry • use A/B Rank Interleaving to optimize result relevance • Rank Interleaving is easy to implement. Works. • in a distributed search architecture manage your A/B test configurations conveniently using Zookeeper • harness your hadoop/search analytics stack for A/B test evaluations • don’t make assumptions about your users! • [Joachims et al 2008]: “How Does Clickthrough Data Reflect Retrieval Quality?”

Thanks! s! Get in touch ch: : hannes. s.kru ruppa@nokia.co com Nokia Maps “Place Discovery” Team, Berlin: Hannes Kruppa, Steffen Bickel, Mark Waldaukat, Felix Weigel, Ross Turner, Peter Siemen

Imp mpro roving Search rch Thro rough Effici cient A/ A/B Test - PowerPoint PPT Presentation

Imp mpro roving Search rch Thro rough Effici cient A/ A/B Test sting: : A A Ca Case se Study Nokia Maps Place Discovery Team, Berlin: Hannes Kruppa, Steffen Bickel, Mark Waldaukat, Felix Weigel, Ross Turner, Peter Siemen Nokia

L L a ndsc a pe a ndsc a pe I I mpro ve me nts mpro ve me nts Optio ns Optio ns Pre se nte

Roving Interpretation: Principles and Practice By: Danielle Bradley, Region 6 Interpretive

Me ta L e a rning : L e ve ra g ing Re se a rc h o n L e a rning to I mpro ve Stude nt Suc

Mixed-Integer Nonlinear Programming Leo Liberti LIX, Ecole Polytechnique, France MPRO

Me Medica cal Cannabis s Formu rmulations s thro rough Unive versi rsity y Rese search

Introd u ction to click - thro u gh rates P R E D IC TIN G C TR W ITH MAC H IN E L E AR N IN G

MI HP Ca se Ma na g e me nt a nd Qua lity I mpro ve me nt Oc to b e r MI HP Co o rdina to r T

E ng a g ing c itize ns a nd c o mmunitie s thro ug h e nviro nme nta l se rvic e a nd e duc a

Optimization for Sustainable Development Leo Liberti LIX, Ecole Polytechnique, France MPRO

I -95 a t Be lvid e re Ro a d T ra nspo rta tio n I mpro ve me nt Stud y WE L COME VI RT

Project Overview I mpro ve s lo ng -te rm Pro te c ts wa te r supplie s with he a lth o

T o o muc h me dic ine a nd ve no us thro mb o e mb o lism: Ho w c a n we ma ke thing s

(S (Search) earch) Box Box Susan Dumais Microsoft Research

Imp mprovi roving ng sh shel elf f life e of of fre resh sh bison son st stea eaks

L A Co mmunity He a lth Pro je c t I mpro ve the he a lth a nd we ll b e ing o f pe o ple a

OPE RAT I ON HE AL T HY WORK SI T E I nve sting in We llne ss Pro g ra ms to I mpro

DC F a mily Pla nning Pro je c t I mpro ving Pe rina ta l He a lth in the Distric t Me e

IMPECD Imp roving E ducation and C ompetences in D ietetics Alexandra Kolm, St. Plten University

Imp mpro rovin ing An Antith tithrombotic tic Ed Educatio ation f for r En Endoscopic

Remote Attestation of IoT Devices via SMARM: Shuffled Measurements Against Roving Malware Xavier

Bridg ing Po la ritie s thro ug h Art A pre se nta tio n a b o ut the F a c ilita tio n T ra

Na ssa u Co unty He a lth I mpro ve me nt Co a litio n (NCHI C) JANUARY 30, 2020 I N PART

Improving roving your ur busine siness s with h Traceab ceabilit ility Alex x Heim

Walking talking urban life: real time roving recordings by first generation migrants Clare