ONLINE LEARNING OF WEBSITE EMBEDDINGS for Accurate Prediction of User Behavior Even when Data are Scarce Amelia White, Director of Data Science Research Nov 13, 2019
Expanding Digital Survey Data SMALL SURVEY PANEL CUSTOM DSTILLERY DEVICE MODEL UNIVERSE ~200MM devices
vanillaicecream.com 4/11/19 Data Used for Modeling buzzfeed.com 4/11/19 nytimes.com 4/11/19 chocicecream.com 4/11/19 buzzfeed.com 4/11/19 nytimes.com 4/11/19 SMALL SURVEY PANEL CUSTOM DSTILLERY DEVICE MODEL UNIVERSE ~200MM devices 1.5B Web site visits daily 3
10 Million URLs Millions of Users 4
Need for a Reduced Dimensional Feature Space 10 Million URLs Thousands of Users
REDUCED DIMENSIONAL FEATURE SPACE
Taking Ideas from Natural Language Processing ● Similar data ● Sentences of words ● Sequences of web sites visited ● High dimensional categorical features
Need for a Reduced Dimensional Feature Space 10 Million URLs Thousands of Users
Need for a Reduced Dimensional Feature Space 10 Million URLs Thousands of Users Thousands of Users 128 Dimensional Embedding Space
Website Embeddings V1: word2vec www.pophaircuts.com P(Context URL |target URL ) Output Layer Fully Connected Edges Kx128 B i B = Embedding matrix i = 0,...,K-1 Dictionary(www.short-hairstyles.co) = i K = 50,000 www.hairstyle.com www.short-hairstyles.co www.pophaircuts.com
Training Word2vec ● Trained word2vec with the browsing history of all devices seen in a 2 week time period: ● Browsing history of 430,648,822 devices ● Sequence of 15,077,897,800 site visits
Visualizing Embeddings Website Cluster # www.boardingarea.com 512 www.thepointsguy.com 512 www.taxifarefinder.com 512 www.theflightdeal.com 512 www.uberestimate.com 512 www.sleepinginairports.net 512 www.frugaltravelguy.com 512 www.airchina.us 512 www.cathaypacific.com 512 www.travelskills.com 512 www.travelsort.com 512 www.skyteam.com 512 www.seatmaestro.com 512 www.flyertalk.com 512 www.expertflyer.com 512 www.singaporeair.com 512 www.estimatefares.com 512
● Embedding millions of BEYOND URLs, with a manageable WORD2VEC: number of parameters ● Online learning of embeddings
EMBEDDING MORE URLS WITH FEWER PARAMETERS
Hash Embeddings
Website Embeddings V2: Hash embeddings Output Layer P= Importance parameters Nx2 Hash Embedding m = 0,...,N Convolution layer N = 10M P m Kx128 B j B = Embedding matrix B i i,j = 0,...,K H 1 (m) = i Dictionary(‘www.kohls.com’) = m H 2 (m) = j
Hash Embedding Requires Fewer Parameters Number of Parameters
Measuring Embedding Quality for Parameter Selection ● Selected a ‘ground truth’ clustering, made from a known high quality embedding ● Used the silhouette score to measure how well test embeddings converged to the ground truth clustering https://platform.ai/blog/page/11/the-silhouette-loss-function-metric-learning-with-a-cluster-v alidity-index/, JIM BREMNER, APRIL 09, 2019 as the network trained
Good Performance with 100x Fewer Parameters Number of Parameters s(i)
ONLINE LEARNING OF EMBEDDINGS
Website Embeddings V3: Online Learning of Hash Embeddings Output Layer P= Importance parameters Nx2 Hash Embedding m = 0,...,N Convolution layer P m Kx128 B j B = Embedding matrix B i i,j = 0,...,K H 1 (m) = i H 0 (‘www.kohls.com’) = m H 2 (m) = j
Online Learning Optimizes Faster than Batch Learning Higher quality embeddings s(i) W2V (batch) Embeddings Hash (online) Embeddings
Training the Online Embeddings B
Distance in Embedding Space
MODELING USERS IN EMBEDDING SPACE
Need for a Reduced Dimensional Feature Space 10 Million URLs Thousands of Users Thousands of Users 128 Dimensional Embedding Space
From URL Embeddings to Models
From URL Embeddings to Models
Embedding Features Outperform Sparse Web Features For Small Data Sets Comparing Embedding Features to Sparse % Gain in AUC Web Features ~1000 training ~1M training examples examples
Embedding Features Outperform Sparse Web Features For Small Data Sets Comparing Embedding Features to Sparse % Gain in AUC Web Features ~1000 training ~1M training examples examples
Embedding Features Outperform Sparse Web Features For Small Data Sets Comparing Embedding Features to Sparse % Gain in AUC Web Features ~1000 training ~1M training examples examples
MODELING SURVEY DATA SMALL SURVEY PANEL CUSTOM DSTILLERY DEVICE MODEL UNIVERSE ~200MM devices
Case Study: Predicting Ad Influence for Ice Cream Brand ● The Problem: ● Our Goal: ○ A survey company models which people are ○ Predicting the high scoring likely to be influenced by an advertisement respondents for an ice cream brand ○ Produce audience of devices that are ○ 5.5K survey respondents predicted to be influenceable by ad for ○ 500 high scoring respondents ice cream brand
Case Study: Predicting Ad Influence for Ice Cream Brand ● Test AUC on predicting high scoring respondents: ○ Raw web behavior: 64.1 ○ Summarized web behavior: 63.5 ○ Cookie Embeddings: 75.8 Website embeddings Sparse web features Clusters of web sites
THANK YOU Presented by Amelia White. awhite@dstillery.com Contributors: Christopher Jenness Melinda Han Williams MLE team: Wickus Martin Roger Cost Justin Moynihan Patrick McCarthy
Recommend
More recommend