Promotion Analysis in Multi-Dimensional Space Tianyi Wu (UIUC) Dong Xin (Microsoft Research) Qiaozhu Mei (University of Michigan) Jiawei Han (UIUC)
2 Outline Introduction Query execution algorithms Spurious promotion Experiment Conclusion
3 Outline Introduction Query execution algorithms Spurious promotion Experiment Conclusion
4 Promotion analysis: introduction Formulate and study a useful function Promotion analysis through ranking General goal: promote a given object by leveraging subspace ranking Motivating example A marketing manager of a book retailer Basic fact Book sales: 30 th out of 100 other retailers Not particularly interesting! After promotion analysis, he discovered: Ranked 1st in the { college students, science and technology } area Further advertising and marketing decisions Another example: person promotion Let’s promote our brand!
5 Promotion query Observation Global rank May not be interesting Local rank Can be more interesting Compare to all other Compare objects in Full-space Subspaces objects in all aspects certain areas Single SQL query Low cost Many subspaces High cost T HE P ROMOTION Q UERY P ROBLEM Given: an object (e.g., product, person) Goal: discover the most interesting subspaces where the object is highly ranked
6 Subspace rank: why interesting Discover merit and competitive strengths E.g., a bestselling car model among hybrid cars Enhance image E.g., fortune 500 company Facilitate decision making E.g., marketing plan that focuses on college students Deliver specific information E.g., “top - 3 university in biomedical research” vs. “top - 20 university” Extensively practiced in marketing Market segmentation Customer targeting and product positioning
7 Challenges Current systems Given a condition, find top- k objects Sophisticated early termination and pruning algorithms Promotion query: not well-supported User: manual search and navigation Trial-and-error Computationally expensive It should be good at … The rank measure: holistic Let me try some queries… A blow-up of subspaces
8 Promotion analysis Multidimensional data model Fact table Location Time Object Score Lyon July T 0.5 Chicago July T 0.8 Chicago August S 1.0 Chicago July S 1.0 Lyon August V 0.3 Chicago August V 0.6 Chicago July V 0.7 Subspace dimensions Object dimension Score dimension
9 Subspaces Location Location Time Time Object Object Score Score Lyon Lyon July July T T 0.5 0.5 Chicago Chicago July July T T 0.8 0.8 Aggregate and compute the target Chicago Chicago August August S S 1.0 1.0 object’s rank in each subspace. Chicago Chicago July July S S 1.0 1.0 Lyon Lyon August August V V 0.3 0.3 {*} Chicago Chicago August August V V 0.6 0.6 SUM(T)=1.3 Chicago Chicago July July V V 0.7 0.7 Rank(T)=3 rd / 3 Given a target object T {Lyon} {Chicago} {July} SUM(T)=0.5 SUM(T)=1.8 SUM(T)=1.3 Rank(T)=1 st / 2 Rank(T)=3 rd / 3 Rank(T)=1 st / 3 {Lyon, July} {Chicago, July} SUM(T)=0.8 SUM(T)=0.5 Rank(T)=2 nd / 3 Subspaces of T Rank(T)=1 st / 1 {*} is the special case: full-space
10 Query model Given a target object T, find the top subspaces which are promotive “ Promotiveness ” : a class of measures to quantify how well a subspace S can promote T P(S, T) = f(Rank(S, T)) * g(Sig(S)) Higher rank ~ more promotive More significant subspace (e.g., more objects) ~ more promotive Example instantiations Simple ranking: P(S, T) = Rank -1 (S, T) Iceberg condition: P(S, T) = Rank -1 (S, T) * I(ObjCount(S)>MinSig) Percentile ranking: P(S, T) = ObjCount(S) / Rank(S, T) …
11 Query model Given a target object T, find the top subspaces which are promotive “ Promotiveness ” : a class of measures to quantify how well a T HE P ROMOTION Q UERY P ROBLEM subspace S can promote T Input: a target object T P(S, T) = f(Rank(S, T)) * g(Sig(S)) Output: top-R subspaces with the largest P(S, T) scores Higher rank ~ more promotive /* assume simple ranking */ More significant subspace (e.g., more objects) ~ more promotive Example instantiations Simple ranking: P(S, T) = Rank -1 (S, T) Iceberg condition: P(S, T) = Rank -1 (S, T) * I(ObjCount(S)>MinSig) Percentile ranking: P(S, T) = ObjCount(S) / Rank(S, T) …
12 Outline Introduction Query execution algorithms (1) PromoRank framework (a) Subspace pruning (b) Object pruning (2) Promotion cubes Spurious promotion Experiment Conclusion
13 The PromoRank framework Idea: use a recursive process to {*} partition and aggregate the data to compute the target object’s rank in each subspace [Beyer99] The bottom-up method {A} {B} {C} {D} {AB} {AC} {AD} {BC} {BD} {CD} {ABC} {ABD} {ACD} {BCD} {ABCD} Target object’s subspace lattice
14 Compute T’s rank in {*} Method: create a hash table: PromoRank: recursive process HashTable[object] = AggregateScore Partition the data based on A {*} Method: sorting Compute T’s rank in {A} 1 {A} {B} {C} {D} {A} 2 10 14 16 Recursively repeat… {AB} {AB} {AC} {AD} {BC} {BD} {CD} 3 7 9 11 13 15 Top-R promotive {ABC} {ABD} {ACD} {BCD} 4 6 8 12 subspaces: priority queue {ABCD} 5
15 (1.1) Subspace pruning Idea: reuse previous results Goal: prune out unseen subspaces by bounding their promotiveness {*} scores Sig(S) : bounded {A} {B} {C} {D} {A} Rank(S, T) : bounded {AB} {AB} {AC} {AD} {BC} {BD} {CD} {ABC} {ABD} {ACD} {BCD} {ABCD}
16 Subspace pruning Keys: Any unseen subspace with low LBRank(T) can be pruned Compute T’s highest possible Rank: LBRank , S}|+ 1 = 3 rd Thus, LBRank(T) = |{V Use the monotonicity of the aggregate measure (e.g. SUM, MAX) SUM(V) > SUM(T) SUM(S) > SUM(T) {B} SUM(T) = 1.9 SUM(V) = 5.5 How to prune an unseen one? SUM(S) = 2.2 10 SUM(V) = 5.5 Given a seen (aggregated) subspace {AB} SUM(S) = 2.2 3 SUM(T) = 1.1 Rank(T) = 3rd / 3
17 (1.2) Object pruning Idea: avoid computing objects Power-law distribution: objects which do not affect rank at the long-tail can be pruned Goal: reduce the partitioning and aggregation cost W and Z can be pruned! SUM(S) = 6.5 SUM(W)<MinScore(T) SUM(T) = 2.2 SUM(Z)<MinScore(T) SUM(U) = 1.5 Seen (aggregated) subspace SUM(W) = 1.0 {A} SUM(Z) = 0.8 Unseen subtree of {AB} {AC} subspaces SUM(T) = 1.9 MinScore(T) = 1.1 SUM(T) = 1.2 {ABC} SUM(T) = 1.1
18 (2) Promotion cubes Observation: (1) T: tends to be highly ranked in a top subspace; (2) A top subspace is likely to contain many objects Method: promotion cube Offline materialization Structure For each subspace with Sig(S)>MinSig parameter: MinSig Materialize a selected sample of top- k aggregate scores in each subspace Parameter(s): k and k’
19 Promotion cell For each “significant” subspace S, create a “promotion cell” Promotion cell: Store aggregate scores; no object IDs Subspace S Parameters MinSig , k , and k’ : chosen to yield a space-time tradeoff; application dependent Passing the MinSig Does not restrict query processing threshold PCell(S) k =9, k’ =3 Object (sorted) Object (sorted)
20 Query execution using promotion cube Step 1: Compute T’s aggregate scores Step 2: Compute LBRanks and UBRanks and do pruning Using the promotion cube {*} {*} Step 3: Call PromoRank SUM(T)=3.0 {A} {A} {B} {B} {C} {C} {D} {D} SUM(T)=2.2 SUM(T)=2.2 SUM(T)=1.9 SUM(T)=1.6 {AB} {AB} {AC} {AC} {AD} {AD} {BC} {BC} {BD} {BD} {CD} {CD} SUM(T)=1.2 SUM(T)=1.9 SUM(T)=1.8 SUM(T)=1.9 SUM(T)=1.5 SUM(T)=0.9 {ABC} {ABC} {ABD} {ABD} {ACD} {ACD} {BCD} {BCD} SUM(T)=1.1 SUM(T)=0.9 SUM(T)=0.5 SUM(T)=0.3 SUM(T)=0.5 {ABCD} {ABCD}
21 Query execution using promotion cube Step 1: Compute T’s aggregate scores Step 2: Compute LBRanks and UBRanks and do pruning [LBRank, UBRank] Using the promotion cube {*} {*} Step 3: Call PromoRank [11, 19] {A} {A} {B} {B} {C} {C} {D} {D} [51, 59] [20, 20] [21, 29] [31, 39] {AB} {AB} {AC} {AC} {AD} {AD} {BC} {BC} {BD} {BD} {CD} {CD} [11, 19] [61,∞) [31, 39] [11, 19] [21, 29] [31, 39] {ABC} {ABC} {ABD} {ABD} {ACD} {ACD} {BCD} {BCD} [21, 29] [61, ∞) [11, 19] [50, 50] {ABCD} {ABCD} [51, 59]
22 Outline Introduction Query execution algorithms Spurious promotion Experiment Conclusion
23 The spurious promotion problem Spurious promotion The target object is highly ranked in a subspace due to random perturbation: not meaningful Example: Michael Jordan (NBA player) Rank Subspace # 1 {Year = 1995} OK # 1 {MonthOfBirth = February} Spurious # 1 {Weather = Sunny} Spurious Due to random perturbation
Recommend
More recommend