Helixa Audience Projection of Target Consumers over Multiple Domains: a NER and Bayesian approach Gianmario Spacagna Chief Scientist @ Helixa O’Reilly AI Conference London, 16th October 2019
About Me 7+ years experience in Data Science and Machine Learning Currently leading a team of ML Scientists and ML Engineers Background in Telematics and Software Engineering of Distributed Systems Ongoing MBA Student Co-author of Python Deep Learning Contributor of the Professional Data Science Manifesto Blogger of Data Science Vademecum Founder of the Data Science Milan community (1.4k members) Gianmario Spacagna Chief Scientist, Helixa Stockholm, London, Milan gspacagna@helixa.ai
DEMOGRAPHICS PSYCHOGRAPHICS Helixa is Market Research platform Female Entertainment 18 - 24 HHI < 40K that uses AI to Junkies Fashion integrate disparate Enthusiasts data sources into an enriched view of the Fast Food Fans consumers who matter to your INTERESTS business. INFLUENCERS Listen to Podcasts Kylie Cosmetics Fan Cardi B ODESZA Chipotle Shane Dawson James Charles Starbucks
In the next 40 minutes... OUR GOAL: Discuss some of the current challenges of traditional market research and propose a novel solution based on Named Entity Recognition (NER) and Bayesian Inference .
Challenges in Market Research
What is Market Research? Information about Applied Social Science Statistical Inference individuals and organizations Gain Insights for Strategic Decisions
Why Market Research matters? Consumers Preferences Market Segmentation and Behaviors Buyer Personas Identify Opportunities Brands Perceptions Market Trends
Approaches to Market Research Qualitative Quantitative Opinions and individual experiences Numbers and Data In-depth interviews Statistics Smaller sample Larger sample
Quantitative Market Research is conducted with Surveys Define Analyze Design Collect Distribute
Limitations of Surveys Predefined questions Expensive Response Bias Narrow coverage Invasive
Market Research using “Implicit Consumers Feedback” Define Analyze Design vs. Collect Distribute e.g. Social Listening
Inferring Interests from Twitter Interactions Twitter Interactions
Advantages of Implicit Consumer Feedback Approaches Flexible costs Wide view Spontaneous Mass coverage Opportunities for Big Data and AI
What about other information? Amazon Purchases Twitter Interactions ? Beer Consumption Brand ?
The Universe of Consumers Datasets First Party (CSM) Consumer Research Social Media Surveys Financial and Behaviors Properties
Individual Consumers Datasets are Far From Being Exhaustive M A L E F E M A L E 18-30 31-43 44-56 57-70 SCATTERED PARTIAL SKEWED
The Holy Grail of Market Research M A L E F E M A L E 18-30 31-43 44-56 57-70 ALL IN ONE COMPLETE REPRESENTATIVE
What is the baseline algorithm for “completing” datasets?
Look-alike Fusion
What is look-alike fusion? Left: Right: Social Network Panel Consumptions Survey Panel
Assignment Optimization Problem Well-known solutions: Hungarian method ● Simplex ● Auction algorithm ●
Datasets Fusion left-only entities right-only entities Left Right User User X X X X X Target = X X Audience X X X X
Look-alike Fusions Requires a Main Panel Centrality
Look-alike Fusions Don’t Scale Well Differences in feature Craftsmanship required Universal objective space at each change of data function to optimize
Is there a more scalable way to “fuse” datasets?
The Audience Projection
Audience Projection defined as “User Binary Classification” Source: Destination: Social Network Panel Consumptions Survey Panel Target = Audience PROJECTION TRUE Ben & Jerry’s: bought in last 6 months? FALSE Affinity: 1.80x TRUE FALSE Angry Orchard: drunk in last 6 months? Affinity: 1.50x 70M 1.6M / 26M / 200M Social accounts Venmo: paid in last 30 days? U.S. consumers Affinity: 1.6x
Solution = Named Entity Recognition (NER) + Bayesian Model Source: Destination: Social Network Panel Consumptions Survey Panel Consumption Social Questions Pages NER NER Projected Users Target Probabilities Audience ENTITY LINKING (NEL) BAYESIAN MODEL
Entities Represent an Universal Feature Space Social Consumption Listed Pages Questions Products NER NER NER
Named Entity Recognition(NER) in each Domain Consumption Social Listed Pages Questions Products Adidas Originals Men's Relaxed Strapback Cap The Coca-Cola Company is a total beverage company, offering over 500 brands in more than 200 countries and territories. Coca-Cola KWC-4 6-Can Personal Mini 12V DC Car and 110V AC Cooler, Red
NLP Libraries with NER capability Polyglot Deep Pavlov
Why for Production? Industry-grade maturity Fast Accurate
example of NER usage
Same Entity May Exist with Different Spellings Interacted with “Have you consumed Coca-Cola Company on Coca-Cola last week?” Social Networks
Linking and Normalizing Entities via en.wikipedia.org/wiki/ Coca-Cola Entity Relationship en.wikipedia.org/wiki/ The_Coca-Cola_Company
Normalized Entities means a Common Feature Space
Stacked Heterogeneous Feature Space destination-only entities source-only entities common entities Target = Audience X X ? ? X X ? ? Source Users Destination Users ? ? X X X X ? ? X X X Latent interests ? ? X X X
Common Entities translate Source to Destination Source: Destination: Social Network Panel Consumptions Survey Panel Target = Audience s e i t i Share of t n E n o m Interests m o C ? Bayesian Model Source Target Size 1.6M / 70M = 2.3%
“Share of interests” encode the DNA of the Target Audience Common Entities Target audience share of interests: Target Audience slice 17% 50% 50% Global share of interests: 100%
Bayesian Model Prior Likelihood Source Target Size=2.3% 𝐐 ( / )∙ 𝐐 ( ) Posterior ∈ ∈ 𝐐 ( / ) = ∈ 𝐐 ( ) Probability of user belonging to projected target given the Evidence Share of Interests on common entities
Evidence Decomposition 𝐐 ( / )∙ 𝐐 ( ) ∈ ∈ 𝐐 ( ) Evidence 𝐐 ( / )∙ 𝐐 ( ) ∉ ∉
Marginal Positive Likelihood Binomial distribution 𝐐 ( / ) ≈ ∈ p=17%
Joint Likelihood under Naive Assumption 𝐐 ( , , / ) = ∈ 17% 50% 50% 𝐐 ( / )∙ ∈ 17% 𝐐 ( / )∙ ∈ 50% 𝐐 ( / ) ∈ 50%
Predicted Probabilities provides Insights on the Projected Users 𝐐 ( / ) ∈ Projected Users Target PROJECTION = Probabilities Audience Insights on Destination Variables Destination TeenNick Robot Bob’s Ben & Venmo Angry Nintendo Video Audio or variables Chicken Burgers Jerry’s Orchard DSi XL Games Video Chat Affinity 8.9x 7.27x 2.36x 1.80x 1.62x 1.55.x 1.47x 1.45x 1.23x
Audience Projection In a Nutshell Consumptions Survey Panel Social Panel Common Entities Target = Audience Affinity: 1.80x Affinity: 1.55x Affinity: 1.62x Bayesian Model
Cool! How do you know this is accurate?
Evaluation Techniques
Binary Classifier Evaluation Ground Truth ? Evaluation techniques Projected Users Probabilities Bayesian Model
Validate via Common Entities common entities Target OR = Audience X X Source Users Destination Users Ground X X X Truth Exact Query Replica X X Projected OR = Audience X
Validate via Self Reconstruction Within the Same Domain destination-only entities source-only entities common entities Target = Audience X X X X X X Source Users Destination Users X X X X X X X X X X Ground X X X Truth
Validate via Double-step Reconstruction PROJECTION PROJECTION Ground Predicted Truth probabilities
Repeat Test Cases Stratifying by Category
Demographics Skewness PROJECTION
Golden Benchmarks Comparison on Aggregated Insights
Opportunities
Many Linked Views of the Same Global Population Audience Projection
Multiple Perspectives Reinforce Reliability Interacted with Game Social Panel Informer social page Affinity: 2.17x Target = Have you read any Game Audience Informer issue? Affinity: 1.73x Game Informer Single Issue Magazine purchased online Affinity: 2.51x
Generalize Audience Projection as a Domain Adaptation Problem
Final Remarks
Many Datasets but only Partial Views
Look-alike fusions don’t scale well
Audience Projection adapts to any “entity domain” Bayesian Model
Accuracy and Biases can be quantified
Strategists now have a complete view of their Target Audience
Gianmario Spacagna Chief Scientist at Helixa.ai gspacagna@helixa.ai @gm_spacagna
Recommend
More recommend