scalable ordinal embedding to model user behavior 2 3
play

SCALABLE ORDINAL EMBEDDING TO MODEL USER BEHAVIOR 2 3 4 PAIRWISE - PowerPoint PPT Presentation

JESSE ANDERTON ADVISOR: JAVED ASLAM COMMITTEE MEMBERS: FERNANDO DIAZ, DAVID SMITH, BYRON WALLACE SCALABLE ORDINAL EMBEDDING TO MODEL USER BEHAVIOR 2 3 4 PAIRWISE CITY DISTANCES Boston NYC Seattle SF Boston 190 2,485 2,692 NYC


  1. JESSE ANDERTON ADVISOR: JAVED ASLAM COMMITTEE MEMBERS: FERNANDO DIAZ, DAVID SMITH, BYRON WALLACE SCALABLE ORDINAL EMBEDDING TO MODEL USER BEHAVIOR

  2. 2

  3. 3

  4. 4 PAIRWISE CITY DISTANCES Boston NYC Seattle SF Boston – 190 2,485 2,692 NYC – – 2,401 2,565 Seattle – – – 679 SF – – – –

  5. 5 TOTAL DISTANCE ORDER Boston NYC Seattle SF Boston – 1 st 4 th 6 th NYC – – 3 rd 5 th Seattle – – – 2 nd SF – – – –

  6. 6 DISTANCE RANKINGS Boston NYC Seattle SF Boston – 1 st 2 nd 3 rd NYC 1 st – 2 nd 3 rd Anchor Seattle 3 rd 2 nd – 1 st SF 3 rd 2 nd 1 st –

  7. 7 DISTANCE RANKINGS Boston NYC Seattle SF Boston – 1 st 2 nd 3 rd NYC 1 st – 2 nd 3 rd Anchor Seattle 3 rd 2 nd – 1 st SF 3 rd 2 nd 1 st – Perfect? Boston NYC Seattle SF

  8. 8 DISTANCE RANKINGS Boston NYC Seattle SF Dallas Boston – 1 st 3 rd 4 th 2 nd NYC 1 st – 3 rd 4 th 2 nd Anchor Seattle 4 th 3 rd – 1 st 2 nd SF 4 th 3 rd 1 st – 2 nd Dallas 3 rd 1 st 4 th 2 nd – Perfect? No! Boston NYC Seattle SF

  9. 9 WHAT IS ORDINAL EMBEDDING? ASSIGNING ORDER-PRESERVING POSITIONS ▸ An embedding positions a set of objects within some vector space (like ℝ d ) to satisfy some objective. ▸ An ordinal embedding focuses on satisfying some given ordering constraints. ▸ Constraints can be expressed as triples like: “Boston is closer to New York City than to Seattle” “The Matrix is more like Star Wars than it is like La La Land” “People who like steak tend to prefer chicken over tofu ”

  10. 10 EVALUATING ORDINAL EMBEDDING EVALUATE BY RANK CORRELATION Mean Kendall’s 𝜐 – Mean rank correlation across anchors Mean 𝜐 AP – Mean top-heavy rank correlation across anchors GROUND TRUTH RANKINGS EMBEDDING RANKINGS Boston NYC Seattle SF Boston NYC Seattle SF 1 st 2 nd 3 rd 1 st 3 rd 2 nd Boston – Boston – 1 st 2 nd 3 rd 1 st 2 nd 3 rd NYC – NYC – Anchor 3 rd 2 nd 1 st 1 st 2 nd 3 rd Seattle – Seattle – SF 3 rd 2 nd 1 st – SF 3 rd 1 st 2 nd –

  11. 11 WHY USE ORDINAL EMBEDDING? HUMAN-BASED PREFERENCE/SIMILARITY ▸ Easier for assessors to say “The Matrix is more like Star Wars than it is like La La Land.” ▸ Focus on lab studies/crowdsourcing limits research interest in scalability. ▸ Limited scalability prohibits focus on similarity expressed through logged user behavior. ORDINAL EMBEDDING OF FACES TAMUZ ET AL., ICML 2011 [3] O. Tamuz, C. Liu, S. Belongie, O. Shamir, and A. T. Kalai, “Adaptively Learning the Crowd Kernel,” ICML, 2011.

  12. 12 ROAD MAP: MY PROPOSED WORK IMPROVE ORDINAL EMBEDDING TECHNIQUES FOR TEXT SIMILARITY APPLICATIONS Active Learning Which triples should we collect? Embedding How can we embed accurately, at scale? Contextual Can we make embeddings that adapt to context? Embeddings

  13. 13 ROAD MAP: MY PROPOSED WORK IMPROVE ORDINAL EMBEDDING TECHNIQUES FOR TEXT SIMILARITY APPLICATIONS Active Learning Which triples should we collect? Embedding How can we embed accurately, at scale? Contextual Can we make embeddings that adapt to context? Embeddings

  14. 14 ACTIVE LEARNING: SIMPLE METHODS ⦿○○○○○○○○○○○○ HOW MANY COMPARISONS TO LEARN ALL RANKINGS? “a IS MORE LIKE b THAN LIKE c” ⇒ 𝜀 ab < 𝜀 ac ⇒ TRIPLE (a, b, c) ▸ O(n 3 ) total triples (with n total objects). DISTANCE RANKINGS ▸ O(n 2 log n) triples to get all rankings. Boston NYC Seattle SF ▸ O(d n log n) triples if a perfect embedding Boston – 1 st 2 nd 3 rd exists in ℝ d (we think) 1 st 2 nd 3 rd NYC – Anchor Seattle 3 rd 2 nd – 1 st ▸ On a limited budget, we want to adaptively SF 3 rd 2 nd 1 st – pick next triples to improve the embedding the most.

  15. 15 RELATED WORK CROWD KERNEL ICML 2011

  16. 16 ACTIVE LEARNING: RELATED WORK ⦿⦿○○○○○○○○○○○ ICML 2011: “ADAPTIVELY LEARNING THE CROWD KERNEL” [T,B,S,K] Prob. that assessor says 𝜀 ab < 𝜀 ac ▸ By “kernel” they mean “embedding.” ▸ Assumes that assessors disagree more when λ + δ 2 ac ( X ) Pr (( a , b , c ) | X ) = similar distances are compared. 2 λ + δ 2 ab ( X ) + δ 2 ac ( X ) ▸ They pick triples that (approximately) maximize Pr((a,b,c)|X) 𝜀 ab (X) 𝜀 ac (X) expected information gain. 1 2 0.75 ▸ Model uses an intermediate embedding to find 2 1 0.25 triples where (a,b,c) and (a,c,b) are both likely. 1.4 1.5 0.53 1.5 1.5 0.50 [3] O. Tamuz, C. Liu, S. Belongie, O. Shamir, and A. T. Kalai, “Adaptively Learning the Crowd Kernel,” ICML, 2011.

  17. 17 ACTIVE LEARNING: RELATED WORK ⦿⦿⦿○○○○○○○○○○ SCORE CARD: CROWD KERNEL After a year trying to use this tool, I decided to write a thesis on better tools. CK Active Learning 🥊 Good for small budgets Num. Objects 🥊 Hundreds Num. Dimensions 🥊 <10 Accuracy 🥊 Medium Speed 🐍 Prohibitively Slow

  18. 18 MY METHOD FRFT ADAPTIVE SORT

  19. 19 3 1 5 2 4

  20. 20 ACTIVE LEARNING: FRFT ADAPTIVE SORT ⦿⦿⦿⦿○○○○○○○○○ FARTHEST-RANK-FIRST TRAVERSAL ADAPTIVE SORT 1. Pick an anchor far from all previous anchors (first time: use a point on boundary). 2. Guess the anchor’s ranking using an embedding of data collected so far. 3. Sort the guessed ranking adaptively: O(n) triples if guess was good, O(n log n) if guess was bad. 4. If guess was very good, stop; else, go to 1. [8] J. Anderton, V. Pavlu, J. Aslam, “Triple Selection for Ordinal Embedding,” unpublished, 2016.

  21. 21 ACTIVE LEARNING: FRFT ADAPTIVE SORT ⦿⦿⦿⦿⦿○○○○○○○○ EMPIRICAL COMPARISON 𝜐 AP IS A TOP-HEAVY RANK CORRELATION MEASURE Tau-AP: 3D GMM FRFT Ranking – My algorithm, using rankings 1 from features – O(n) triples per ranking. 0.9 FRFT Adaptive Sort – My algorithm, using no 0.8 prior knowledge – O(n log n) then O(n). 0.7 Tau-AP 0.6 Crowd Kernel – Active learning baseline. 0.5 Random Tails – Random baseline. 0.4 kNN – Gradually add next NN for each obj. 0.3 Landmarks – Gradually add objects to all 0.2 rankings. 0.1 0 1 2 3 4 5 6 7 × 10 4 Number of Comparisons [8] J. Anderton, V. Pavlu, J. Aslam, “Triple Selection for Ordinal Embedding,” unpublished, 2016.

  22. 22 ACTIVE LEARNING: FRFT ADAPTIVE SORT ⦿⦿⦿⦿⦿⦿○○○○○○○ SCORE CARD: FRFT ADAPTIVE SORT Active learning beats CK, but we still have work to do. CK AS Active Learning 🥊 🥉 Approaches lower bound Num. Objects 10,000’s 🥊 🥉 Num. Dimensions 🥊 🥊 <10 Accuracy Very good 🥊 🥉 Speed Medium 🐍 🐈

  23. 23 PROPOSED WORK

  24. 24 ACTIVE LEARNING: CAN WE DO BETTER? ⦿⦿⦿⦿⦿⦿⦿○○○○○○ CAN WE DO BETTER? ▸ Empirically, FRFT Adaptive Sort approaches the lower bound [4] of Ω (d n log n). ▸ Intermediate embedding step is slow and error-prone. ▸ When our guess is already correct, we still waste (?) triples to confirm it. ▸ I believe we can avoid the embedding step and reduce redundancy using the geometry implied by the triples. [4] K. G. Jamieson and R. D. Nowak, Low-dimensional embedding using adaptively selected ordinal data. IEEE, 2011, pp. 1077–1084.

  25. 25 ACTIVE LEARNING: WHAT DO TRIPLES TELL US? ⦿⦿⦿⦿⦿⦿⦿⦿○○○○○ a IS MORE LIKE b THAN c : (a,b,c) THE THREE VIEWS OF A “TRIPLE CONSTRAINT” 𝜀 ab < 𝜀 ac a b a a 𝜀 ab b b 𝜀 ac c c c a IS INSIDE A HALF-SPACE b IS INSIDE A SPHERE c IS OUTSIDE A SPHERE

  26. 26 ACTIVE LEARNING: WHAT DO TRIPLES TELL US? ⦿⦿⦿⦿⦿⦿⦿⦿⦿○○○○ COMBINING TRIPLE CONSTRAINTS 𝜀 ab < 𝜀 ac < 𝜀 ad a ∧ ⇒ a a b b b d d c c d c b, c ARE INSIDE A SPHERE c IS INSIDE A SPHERICAL SHELL c, d ARE OUTSIDE A SPHERE

  27. 27 ACTIVE LEARNING: WHAT DO TRIPLES TELL US? ⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿○○○ COMBINING SPHERICAL SHELLS Shell Intersection f a b f a e b d c e i j g d c h g Shell Intersection TWO SHELLS IN R 2 THREE SHELLS IN R 2

  28. 28 ACTIVE LEARNING: WHAT DO TRIPLES TELL US? ⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿○○ PARTIAL ORDERING ON VECTOR PROJECTIONS r s s r t t p q r' s' t' p q r' s' t' INFERRING ORDER IN BLUE BALL INTERSECTION INFERRING ORDER NEAR BLUE BALL INTERSECTION P, R’, S’, T’, Q P, Q, R’, S’, T’

  29. 29 ACTIVE LEARNING: PROPOSED METHOD ⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿○ GUESSING ORDER WITH LINE PROJECTION ▸ Line projection preserves approximate order. [6] ▸ Rankings for a pair of points gives partial order of projections onto their connecting line. ▸ Idea: Don’t waste time on intermediate embedding; guess order by majority vote of partial orders! [6] K. Li and J. Malik, “Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing,” ICML, 2016.

  30. 30 ACTIVE LEARNING: PROPOSED METHOD ⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿⦿ GUESSING ORDER WITH LINE PROJECTION TWO RANKINGS r r Point NN Maj. Vote s t u (1/1) u' u' t s u (1/1) u u u t t (1/1) t' s' t' THREE RANKINGS s' Point NN Maj. Vote s t t (2/3) t t t s u (2/3) s s u t t (2/3) p p s' s' u' u' t' t' q q

  31. 31 ROAD MAP: MY PROPOSED WORK IMPROVE ORDINAL EMBEDDING TECHNIQUES FOR TEXT SIMILARITY APPLICATIONS Active Learning Which triples should we collect? Embedding How can we embed accurately, at scale? Contextual Can we make embeddings that adapt to context? Embeddings

Recommend


More recommend