promotion analysis in
play

Promotion Analysis in Multi-Dimensional Space Tianyi Wu (UIUC) - PowerPoint PPT Presentation

Promotion Analysis in Multi-Dimensional Space Tianyi Wu (UIUC) Dong Xin (Microsoft Research) Qiaozhu Mei (University of Michigan) Jiawei Han (UIUC) 2 Outline Introduction Query execution algorithms Spurious promotion


  1. Promotion Analysis in Multi-Dimensional Space Tianyi Wu (UIUC) Dong Xin (Microsoft Research) Qiaozhu Mei (University of Michigan) Jiawei Han (UIUC)

  2. 2 Outline  Introduction  Query execution algorithms  Spurious promotion  Experiment  Conclusion

  3. 3 Outline  Introduction  Query execution algorithms  Spurious promotion  Experiment  Conclusion

  4. 4 Promotion analysis: introduction  Formulate and study a useful function  Promotion analysis through ranking  General goal: promote a given object by leveraging subspace ranking  Motivating example  A marketing manager of a book retailer  Basic fact  Book sales: 30 th out of 100 other retailers  Not particularly interesting!  After promotion analysis, he discovered:  Ranked 1st in the { college students, science and technology } area  Further advertising and marketing decisions  Another example: person promotion Let’s promote our brand!

  5. 5 Promotion query Observation Global rank May not be interesting Local rank Can be more interesting Compare to all other Compare objects in Full-space Subspaces objects in all aspects certain areas Single SQL query Low cost Many subspaces High cost T HE P ROMOTION Q UERY P ROBLEM Given: an object (e.g., product, person) Goal: discover the most interesting subspaces where the object is highly ranked

  6. 6 Subspace rank: why interesting  Discover merit and competitive strengths  E.g., a bestselling car model among hybrid cars  Enhance image  E.g., fortune 500 company  Facilitate decision making  E.g., marketing plan that focuses on college students  Deliver specific information  E.g., “top - 3 university in biomedical research” vs. “top - 20 university”  Extensively practiced in marketing  Market segmentation  Customer targeting and product positioning

  7. 7 Challenges  Current systems  Given a condition, find top- k objects  Sophisticated early termination and pruning algorithms  Promotion query: not well-supported  User: manual search and navigation  Trial-and-error  Computationally expensive It should be good at …  The rank measure: holistic Let me try some queries…  A blow-up of subspaces

  8. 8 Promotion analysis Multidimensional data model  Fact table Location Time Object Score Lyon July T 0.5 Chicago July T 0.8 Chicago August S 1.0 Chicago July S 1.0 Lyon August V 0.3 Chicago August V 0.6 Chicago July V 0.7 Subspace dimensions Object dimension Score dimension

  9. 9 Subspaces Location Location Time Time Object Object Score Score Lyon Lyon July July T T 0.5 0.5 Chicago Chicago July July T T 0.8 0.8 Aggregate and compute the target Chicago Chicago August August S S 1.0 1.0 object’s rank in each subspace. Chicago Chicago July July S S 1.0 1.0 Lyon Lyon August August V V 0.3 0.3 {*} Chicago Chicago August August V V 0.6 0.6 SUM(T)=1.3 Chicago Chicago July July V V 0.7 0.7 Rank(T)=3 rd / 3 Given a target object T {Lyon} {Chicago} {July} SUM(T)=0.5 SUM(T)=1.8 SUM(T)=1.3 Rank(T)=1 st / 2 Rank(T)=3 rd / 3 Rank(T)=1 st / 3 {Lyon, July} {Chicago, July} SUM(T)=0.8 SUM(T)=0.5 Rank(T)=2 nd / 3 Subspaces of T Rank(T)=1 st / 1 {*} is the special case: full-space

  10. 10 Query model  Given a target object T, find the top subspaces which are promotive  “ Promotiveness ” : a class of measures to quantify how well a subspace S can promote T  P(S, T) = f(Rank(S, T)) * g(Sig(S))  Higher rank ~ more promotive  More significant subspace (e.g., more objects) ~ more promotive  Example instantiations  Simple ranking: P(S, T) = Rank -1 (S, T)  Iceberg condition: P(S, T) = Rank -1 (S, T) * I(ObjCount(S)>MinSig)  Percentile ranking: P(S, T) = ObjCount(S) / Rank(S, T)  …

  11. 11 Query model  Given a target object T, find the top subspaces which are promotive  “ Promotiveness ” : a class of measures to quantify how well a T HE P ROMOTION Q UERY P ROBLEM subspace S can promote T Input: a target object T  P(S, T) = f(Rank(S, T)) * g(Sig(S)) Output: top-R subspaces with the largest P(S, T) scores  Higher rank ~ more promotive /* assume simple ranking */  More significant subspace (e.g., more objects) ~ more promotive  Example instantiations  Simple ranking: P(S, T) = Rank -1 (S, T)  Iceberg condition: P(S, T) = Rank -1 (S, T) * I(ObjCount(S)>MinSig)  Percentile ranking: P(S, T) = ObjCount(S) / Rank(S, T)  …

  12. 12 Outline  Introduction  Query execution algorithms  (1) PromoRank framework  (a) Subspace pruning  (b) Object pruning  (2) Promotion cubes  Spurious promotion  Experiment  Conclusion

  13. 13 The PromoRank framework Idea: use a recursive process to {*} partition and aggregate the data to compute the target object’s rank in each subspace [Beyer99] The bottom-up method {A} {B} {C} {D} {AB} {AC} {AD} {BC} {BD} {CD} {ABC} {ABD} {ACD} {BCD} {ABCD} Target object’s subspace lattice

  14. 14 Compute T’s rank in {*} Method: create a hash table: PromoRank: recursive process HashTable[object] = AggregateScore Partition the data based on A {*} Method: sorting Compute T’s rank in {A} 1 {A} {B} {C} {D} {A} 2 10 14 16 Recursively repeat… {AB} {AB} {AC} {AD} {BC} {BD} {CD} 3 7 9 11 13 15 Top-R promotive {ABC} {ABD} {ACD} {BCD} 4 6 8 12 subspaces: priority queue {ABCD} 5

  15. 15 (1.1) Subspace pruning  Idea: reuse previous results  Goal: prune out unseen subspaces by bounding their promotiveness {*} scores Sig(S) : bounded {A} {B} {C} {D} {A} Rank(S, T) : bounded {AB} {AB} {AC} {AD} {BC} {BD} {CD} {ABC} {ABD} {ACD} {BCD} {ABCD}

  16. 16 Subspace pruning  Keys: Any unseen subspace with low LBRank(T) can be pruned  Compute T’s highest possible Rank: LBRank , S}|+ 1 = 3 rd Thus, LBRank(T) = |{V  Use the monotonicity of the aggregate measure (e.g. SUM, MAX) SUM(V) > SUM(T) SUM(S) > SUM(T) {B} SUM(T) = 1.9 SUM(V) = 5.5 How to prune an unseen one? SUM(S) = 2.2 10 SUM(V) = 5.5 Given a seen (aggregated) subspace {AB} SUM(S) = 2.2 3 SUM(T) = 1.1 Rank(T) = 3rd / 3

  17. 17 (1.2) Object pruning Idea: avoid computing objects Power-law distribution: objects which do not affect rank at the long-tail can be pruned Goal: reduce the partitioning and aggregation cost W and Z can be pruned! SUM(S) = 6.5 SUM(W)<MinScore(T) SUM(T) = 2.2 SUM(Z)<MinScore(T) SUM(U) = 1.5 Seen (aggregated) subspace SUM(W) = 1.0 {A} SUM(Z) = 0.8 Unseen subtree of {AB} {AC} subspaces SUM(T) = 1.9 MinScore(T) = 1.1 SUM(T) = 1.2 {ABC} SUM(T) = 1.1

  18. 18 (2) Promotion cubes Observation: (1) T: tends to be highly ranked in a top subspace; (2) A top subspace is likely to contain many objects  Method: promotion cube  Offline materialization  Structure  For each subspace with Sig(S)>MinSig  parameter: MinSig  Materialize a selected sample of top- k aggregate scores in each subspace  Parameter(s): k and k’

  19. 19 Promotion cell  For each “significant” subspace S, create a “promotion cell”  Promotion cell:  Store aggregate scores; no object IDs Subspace S  Parameters MinSig , k , and k’ : chosen to yield a space-time tradeoff; application dependent Passing the MinSig  Does not restrict query processing threshold PCell(S) k =9, k’ =3 Object (sorted) Object (sorted)

  20. 20 Query execution using promotion cube  Step 1: Compute T’s aggregate scores  Step 2: Compute LBRanks and UBRanks and do pruning  Using the promotion cube {*} {*}  Step 3: Call PromoRank SUM(T)=3.0 {A} {A} {B} {B} {C} {C} {D} {D} SUM(T)=2.2 SUM(T)=2.2 SUM(T)=1.9 SUM(T)=1.6 {AB} {AB} {AC} {AC} {AD} {AD} {BC} {BC} {BD} {BD} {CD} {CD} SUM(T)=1.2 SUM(T)=1.9 SUM(T)=1.8 SUM(T)=1.9 SUM(T)=1.5 SUM(T)=0.9 {ABC} {ABC} {ABD} {ABD} {ACD} {ACD} {BCD} {BCD} SUM(T)=1.1 SUM(T)=0.9 SUM(T)=0.5 SUM(T)=0.3 SUM(T)=0.5 {ABCD} {ABCD}

  21. 21 Query execution using promotion cube  Step 1: Compute T’s aggregate scores  Step 2: Compute LBRanks and UBRanks and do pruning [LBRank, UBRank]  Using the promotion cube {*} {*}  Step 3: Call PromoRank [11, 19] {A} {A} {B} {B} {C} {C} {D} {D} [51, 59] [20, 20] [21, 29] [31, 39] {AB} {AB} {AC} {AC} {AD} {AD} {BC} {BC} {BD} {BD} {CD} {CD} [11, 19] [61,∞) [31, 39] [11, 19] [21, 29] [31, 39] {ABC} {ABC} {ABD} {ABD} {ACD} {ACD} {BCD} {BCD} [21, 29] [61, ∞) [11, 19] [50, 50] {ABCD} {ABCD} [51, 59]

  22. 22 Outline  Introduction  Query execution algorithms  Spurious promotion  Experiment  Conclusion

  23. 23 The spurious promotion problem  Spurious promotion  The target object is highly ranked in a subspace due to random perturbation: not meaningful  Example: Michael Jordan (NBA player) Rank Subspace # 1 {Year = 1995} OK # 1 {MonthOfBirth = February} Spurious # 1 {Weather = Sunny} Spurious Due to random perturbation

Recommend


More recommend