Collaborative Embedding Features and Diversified Ensemble for - PowerPoint PPT Presentation

Collaborative Embedding Features and Diversified Ensemble for E-Commerce Repeat Buyer Prediction Zhanpeng Fang*, Zhilin Yang*, Yutao Zhang Tsinghua Univ. (* equal contribution) 1

Results • Team “FAndy&kimiyoung&Neo” • 2nd place in stage 1 • 3rd place in stage 2 • The only team marching in top 3 of both stages 2

Team Members • Zhanpeng Fang – Master student, Tsinghua Univ. & Carnegie Mellon Univ. • Zhilin Yang – Bachelor E., Tsinghua Univ. • Yutao Zhang – PhD student, Tsinghua Univ. 3

Task • Input: – User behavior logs • user, item, category, merchant, brand, timestamp, action – User profile • age, gender. • Output: – The probability that a new buyer of a merchant is a repeat buyer 4

Challenges • Heterogeneous data – User, merchant, category, brand, item • Repeat buyer modeling – What are the characteristic features for modeling repeat buyer? • Collaborative information – How to leverage the collaborative information between users and merchants [in a shared space]? 5

Framework 6

Framework Two novel feature sets, Repeat features && Embedding features 7

Framework Three individual models 8

Framework Diversified Ensemble 9

Feature Engineering – Basic Features • User-Related Features – Age, gender, # of different actions – #items/merchants/ … that clicked/purchased/favored – Omitting add-to-cart in all actions related features increases performance (since almost identical to purchase) • Merchant-Related Features – Merchant ID – #actions and #distinct users that clicked/purchased/ favored (only in Stage 1) 10

Feature Engineering – Basic Features • User-Merchant Features – # different actions – Category IDs and brand IDs of the purchased items • Post Processing – Feature binning in Stage 1 – Log(1+x) conversion in Stage 2 – Perform similarly. Both much better than raw values. 11

Repeat Features • User Repeat Features – Average span between any two actions – Average span between two purchases – How many days since last purchase Action 1 Action 2 time span 2014.1 2014.6 2014.12 12

Repeat Features • User-Merchant/Category/Brand/Item Repeat Features – Average active days for one merchant/ category/brand/item – Maximum active days for one merchant/ category/brand/item – Average span between any two actions for one merchant/category/brand/item – Ratio of merchants/categories/brands/items with repeated actions 13

Repeat Features • Category/Brand/Item Repeat Features – Average active days on given category/category/brand/item of all users – Ratio of repeated active users on given category/brand/item – Maximum active days on given category/brand/item of all users – Average days of purchasing the given category/brand/item of all users – Ratio of users who purchase the given categories/brands/item more then once – Maximum days of purchasing the given category / brand/item of all users – Average span between two actions of purchasing the given category/brand/item of all users 14

Embedding Features u1 u2 m1 u3 m2 Heterogeneous interaction graph 15

Embedding Features Random walk W = …… u1 u2 m1 u3 m2 Heterogeneous interaction graph 16

Embedding Features Random walk W = …… u1 u2 m1 u3 Skipgram model m2 u1 …… Heterogeneous interaction graph Embedded u2 …… vectors m1 …… 17

Embedding Features: Interaction Graph u1 • Let the graph G = (V, E) u2 m1 – V is the vertex set u3 – E is the edge set m2 • V contains all users and merchants • If user u interacts with merchant m, then add an edge <u, m> into E 18

Embedding Features: Random Walk • Repeat a given number of times – For each vertex v in V • Generate a sequence of random walk starting from v • Append the sequence to the corpus W = …… Generate random walk corpus 19

Embedding Features: Skipgram W(j - 2) W(j - 1) W(j + 1) W(j + 2) W(j) Use the current word W(j) to predict the context. Objective function: Use SGD to optimize the above objective and obtain embeddings for users and merchants. 20

Embedding Features: Dot Products • Now we have embeddings of all users and merchants. • Given a pair <u, m>, we derive a feature ! f m f u • to represent the semantic similarity between u and m. • f means embeddings. 21

Embedding Features: Diversification • Simply applying the dot product of embeddings is not powerful enough. • Recall that we use SGD to learn the embeddings. • We use embeddings at different iterations of SGD. • An example – Run 100 iterations of SGD. – Read out embeddings at iteration 10, 20, … , 100. – Obtain a 10-dim feature vector of dot products • Intuition: similar to ensemble models with different regularization strengths 22

Individual Models • Logistic regression – Use the implementation of Liblinear • Factorization machine – Use the implementation of LibFM • Gradient boosted decision trees – Use the implementation of XGBoost Method Implementation Best AUC in Stage 1 (%) Logistic Regression Liblinear 69.782 Factorization Machine LibFM 69.509 GBDT XGBoost 69.196 23

Diversified Ensemble M1 F0 F1 M2 F2 M3 … … Fn Feature set Model set Ridge regression Final Results 24

Diversified Ensemble: Appending New Features Basic Basic Basic Features Features Features New Features Repeat Repeat Features Features Embedding Features Feature set F0 Feature set F1 Feature set F2 25

Diversified Ensemble: Cartesian Product LR GBDT FM Feature Set Ensemble 1 Ensemble 2 Ensemble 3 F0 Feature Set Ensemble 4 Ensemble 5 Ensemble 6 F1 Feature Set Ensemble 7 Ensemble 8 Ensemble 9 F2 26

Diversified Ensemble Results • Simple ensemble: Only ensemble the top 3 models • Diversified ensemble outperforms simple ensemble Method Implementation Best AUC in Stage 1 (%) Logistic Regression Liblinear 69.782 Factorization Machine LibFM 69.509 GBDT XGBoost 69.196 Simple Ensemble Sklearn Ridge 70.329 Diversified Ensemble Sklearn Ridge 70.476 27

Factor Contribution Analysis • Clear performance increase after adding each feature set • Both embedding features and repeat features provide unique information to help the prediction • The results are based on Logistic Regression No. Feature Sets Stage 1 AUC (%) Gain 1 Basic features 69.369 - 2 1 + Embedding features 69.495 0.126 3 2 + Repeat features 69.782 0.287 28

Stage 2 Performance • Repeat features are consistent in both stages • Data cleaning is important – duplicated/inconsistent records exist in this stage • The results are based on Logistic Regression No. Method AUC (%) Gain 1 Basic features 70.346 - 2 1 + Repeat features 70.589 0.243 3 2 + Data cleaning & more features 70.898 0.309 4 3 + Fine-tuning parameters 71.016 0.118 29

Summary • “Tricks” on how to win top 3 in both stages – Diversified ensemble – Novel embedding features 30

Thank you!  Questions？ 31

Collaborative Embedding Features and Diversified Ensemble for - PowerPoint PPT Presentation

Collaborative Embedding Features and Diversified Ensemble for E-Commerce Repeat Buyer Prediction Zhanpeng Fang, Zhilin Yang, Yutao Zhang Tsinghua Univ. (* equal contribution) 1 Results Team FAndy&kimiyoung&Neo 2nd place

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

COMPANY PROFILE WATER FEATURES 1 WATER FEATURES 2 WATER FEATURES 3 WATER FEATURES 4 WATER

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning

Convergence of ensemble Kalman filters in the large ensemble limit and infinite dimension Jan

Linear ensemble transform filters: A unified perspective on ensemble Kalman and particle filters

What is it? Instrument or ensemble? Lars Bo Andersen, Humans and IT research seminar, 13/5-2015

Stochastic Physics Perturbations For Ensemble Forecast Yuejian Zhu Ensemble Team Environmental

State Song & Dance Ensemble LIETUVA proposal of cooperation Who are we? We are

Gaussian ensemble screening (GES): A new Gaussian ensemble screening (GES): A new approach to

Ensemble verification: Old scores, new perspectives Sabrina Wahl, Petra Friederichs, Jan Keller

Ensemble Docking Revisited Oliver Korb Cambridge Crystallographic Data Centre

A Local al Ca Call t to Action OCTOBER 26, 2018 BUSINESS LEADERS HOUSING BREAKFAST SPONSORS:

I-95 C ORRIDOR C OALITION T HE E ASTERN T RANSPORTATION C OALITION E XECUTIVE B OARD M EETING

Wor orking T Tog ogeth ther t to o Improve T Tran ansp sportati tion Ma ryg ra c e Pa

Stimulating Economic Growth through Private/Public Partnership James C. Smith Co-chair,

Big Data Challenges and Opportunities Ira A. (Gus) Hunt Chief Technology Officer Our Mission

Future of Mobility: Technical Advisory Committee 2 nd Meeting July 24, 2017 Corwin Bell, MCP Email:

Privacy, Economics, and Immediate Gratification: Why Protecting Privacy is Easy, But Selling It

Address Subcommittee January 11, 2017 1:00 2:30 PM Eastern U.S. Department of Transportation

Collaborative Embedding Features and Diversified Ensemble for - PowerPoint PPT Presentation

Collaborative Embedding Features and Diversified Ensemble for E-Commerce Repeat Buyer Prediction Zhanpeng Fang*, Zhilin Yang*, Yutao Zhang Tsinghua Univ. (* equal contribution) 1 Results Team FAndy&kimiyoung&Neo 2nd place

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

COMPANY PROFILE WATER FEATURES 1 WATER FEATURES 2 WATER FEATURES 3 WATER FEATURES 4 WATER

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning

Convergence of ensemble Kalman filters in the large ensemble limit and infinite dimension Jan

Linear ensemble transform filters: A unified perspective on ensemble Kalman and particle filters

What is it? Instrument or ensemble? Lars Bo Andersen, Humans and IT research seminar, 13/5-2015

Stochastic Physics Perturbations For Ensemble Forecast Yuejian Zhu Ensemble Team Environmental

State Song &amp; Dance Ensemble LIETUVA proposal of cooperation Who are we? We are

Gaussian ensemble screening (GES): A new Gaussian ensemble screening (GES): A new approach to

Ensemble verification: Old scores, new perspectives Sabrina Wahl, Petra Friederichs, Jan Keller

Ensemble Docking Revisited Oliver Korb Cambridge Crystallographic Data Centre

A Local al Ca Call t to Action OCTOBER 26, 2018 BUSINESS LEADERS HOUSING BREAKFAST SPONSORS:

I-95 C ORRIDOR C OALITION T HE E ASTERN T RANSPORTATION C OALITION E XECUTIVE B OARD M EETING

Wor orking T Tog ogeth ther t to o Improve T Tran ansp sportati tion Ma ryg ra c e Pa

Stimulating Economic Growth through Private/Public Partnership James C. Smith Co-chair,

Big Data Challenges and Opportunities Ira A. (Gus) Hunt Chief Technology Officer Our Mission

Future of Mobility: Technical Advisory Committee 2 nd Meeting July 24, 2017 Corwin Bell, MCP Email:

Privacy, Economics, and Immediate Gratification: Why Protecting Privacy is Easy, But Selling It

Address Subcommittee January 11, 2017 1:00 2:30 PM Eastern U.S. Department of Transportation

Collaborative Embedding Features and Diversified Ensemble for E-Commerce Repeat Buyer Prediction Zhanpeng Fang, Zhilin Yang, Yutao Zhang Tsinghua Univ. (* equal contribution) 1 Results Team FAndy&kimiyoung&Neo 2nd place

State Song & Dance Ensemble LIETUVA proposal of cooperation Who are we? We are