helixa
play

Helixa Audience Projection of Target Consumers over Multiple - PowerPoint PPT Presentation

Helixa Audience Projection of Target Consumers over Multiple Domains: a NER and Bayesian approach Gianmario Spacagna Chief Scientist @ Helixa OReilly AI Conference London, 16th October 2019 About Me 7+ years experience in Data Science and


  1. Helixa Audience Projection of Target Consumers over Multiple Domains: a NER and Bayesian approach Gianmario Spacagna Chief Scientist @ Helixa O’Reilly AI Conference London, 16th October 2019

  2. About Me 7+ years experience in Data Science and Machine Learning Currently leading a team of ML Scientists and ML Engineers Background in Telematics and Software Engineering of Distributed Systems Ongoing MBA Student Co-author of Python Deep Learning Contributor of the Professional Data Science Manifesto Blogger of Data Science Vademecum Founder of the Data Science Milan community (1.4k members) Gianmario Spacagna Chief Scientist, Helixa Stockholm, London, Milan gspacagna@helixa.ai

  3. DEMOGRAPHICS PSYCHOGRAPHICS Helixa is Market Research platform Female Entertainment 18 - 24 HHI < 40K that uses AI to Junkies Fashion integrate disparate Enthusiasts data sources into an enriched view of the Fast Food Fans consumers who matter to your INTERESTS business. INFLUENCERS Listen to Podcasts Kylie Cosmetics Fan Cardi B ODESZA Chipotle Shane Dawson James Charles Starbucks

  4. In the next 40 minutes... OUR GOAL: Discuss some of the current challenges of traditional market research and propose a novel solution based on Named Entity Recognition (NER) and Bayesian Inference .

  5. Challenges in Market Research

  6. What is Market Research? Information about Applied Social Science Statistical Inference individuals and organizations Gain Insights for Strategic Decisions

  7. Why Market Research matters? Consumers Preferences Market Segmentation and Behaviors Buyer Personas Identify Opportunities Brands Perceptions Market Trends

  8. Approaches to Market Research Qualitative Quantitative Opinions and individual experiences Numbers and Data In-depth interviews Statistics Smaller sample Larger sample

  9. Quantitative Market Research is conducted with Surveys Define Analyze Design Collect Distribute

  10. Limitations of Surveys Predefined questions Expensive Response Bias Narrow coverage Invasive

  11. Market Research using “Implicit Consumers Feedback” Define Analyze Design vs. Collect Distribute e.g. Social Listening

  12. Inferring Interests from Twitter Interactions Twitter Interactions

  13. Advantages of Implicit Consumer Feedback Approaches Flexible costs Wide view Spontaneous Mass coverage Opportunities for Big Data and AI

  14. What about other information? Amazon Purchases Twitter Interactions ? Beer Consumption Brand ?

  15. The Universe of Consumers Datasets First Party (CSM) Consumer Research Social Media Surveys Financial and Behaviors Properties

  16. Individual Consumers Datasets are Far From Being Exhaustive M A L E F E M A L E 18-30 31-43 44-56 57-70 SCATTERED PARTIAL SKEWED

  17. The Holy Grail of Market Research M A L E F E M A L E 18-30 31-43 44-56 57-70 ALL IN ONE COMPLETE REPRESENTATIVE

  18. What is the baseline algorithm for “completing” datasets?

  19. Look-alike Fusion

  20. What is look-alike fusion? Left: Right: Social Network Panel Consumptions Survey Panel

  21. Assignment Optimization Problem Well-known solutions: Hungarian method ● Simplex ● Auction algorithm ●

  22. Datasets Fusion left-only entities right-only entities Left Right User User X X X X X Target = X X Audience X X X X

  23. Look-alike Fusions Requires a Main Panel Centrality

  24. Look-alike Fusions Don’t Scale Well Differences in feature Craftsmanship required Universal objective space at each change of data function to optimize

  25. Is there a more scalable way to “fuse” datasets?

  26. The Audience Projection

  27. Audience Projection defined as “User Binary Classification” Source: Destination: Social Network Panel Consumptions Survey Panel Target = Audience PROJECTION TRUE Ben & Jerry’s: bought in last 6 months? FALSE Affinity: 1.80x TRUE FALSE Angry Orchard: drunk in last 6 months? Affinity: 1.50x 70M 1.6M / 26M / 200M Social accounts Venmo: paid in last 30 days? U.S. consumers Affinity: 1.6x

  28. Solution = Named Entity Recognition (NER) + Bayesian Model Source: Destination: Social Network Panel Consumptions Survey Panel Consumption Social Questions Pages NER NER Projected Users Target Probabilities Audience ENTITY LINKING (NEL) BAYESIAN MODEL

  29. Entities Represent an Universal Feature Space Social Consumption Listed Pages Questions Products NER NER NER

  30. Named Entity Recognition(NER) in each Domain Consumption Social Listed Pages Questions Products Adidas Originals Men's Relaxed Strapback Cap The Coca-Cola Company is a total beverage company, offering over 500 brands in more than 200 countries and territories. Coca-Cola KWC-4 6-Can Personal Mini 12V DC Car and 110V AC Cooler, Red

  31. NLP Libraries with NER capability Polyglot Deep Pavlov

  32. Why for Production? Industry-grade maturity Fast Accurate

  33. example of NER usage

  34. Same Entity May Exist with Different Spellings Interacted with “Have you consumed Coca-Cola Company on Coca-Cola last week?” Social Networks

  35. Linking and Normalizing Entities via en.wikipedia.org/wiki/ Coca-Cola Entity Relationship en.wikipedia.org/wiki/ The_Coca-Cola_Company

  36. Normalized Entities means a Common Feature Space

  37. Stacked Heterogeneous Feature Space destination-only entities source-only entities common entities Target = Audience X X ? ? X X ? ? Source Users Destination Users ? ? X X X X ? ? X X X Latent interests ? ? X X X

  38. Common Entities translate Source to Destination Source: Destination: Social Network Panel Consumptions Survey Panel Target = Audience s e i t i Share of t n E n o m Interests m o C ? Bayesian Model Source Target Size 1.6M / 70M = 2.3%

  39. “Share of interests” encode the DNA of the Target Audience Common Entities Target audience share of interests: Target Audience slice 17% 50% 50% Global share of interests: 100%

  40. Bayesian Model Prior Likelihood Source Target Size=2.3% 𝐐 ( / )∙ 𝐐 ( ) Posterior ∈ ∈ 𝐐 ( / ) = ∈ 𝐐 ( ) Probability of user belonging to projected target given the Evidence Share of Interests on common entities

  41. Evidence Decomposition 𝐐 ( / )∙ 𝐐 ( ) ∈ ∈ 𝐐 ( ) Evidence 𝐐 ( / )∙ 𝐐 ( ) ∉ ∉

  42. Marginal Positive Likelihood Binomial distribution 𝐐 ( / ) ≈ ∈ p=17%

  43. Joint Likelihood under Naive Assumption 𝐐 ( , , / ) = ∈ 17% 50% 50% 𝐐 ( / )∙ ∈ 17% 𝐐 ( / )∙ ∈ 50% 𝐐 ( / ) ∈ 50%

  44. Predicted Probabilities provides Insights on the Projected Users 𝐐 ( / ) ∈ Projected Users Target PROJECTION = Probabilities Audience Insights on Destination Variables Destination TeenNick Robot Bob’s Ben & Venmo Angry Nintendo Video Audio or variables Chicken Burgers Jerry’s Orchard DSi XL Games Video Chat Affinity 8.9x 7.27x 2.36x 1.80x 1.62x 1.55.x 1.47x 1.45x 1.23x

  45. Audience Projection In a Nutshell Consumptions Survey Panel Social Panel Common Entities Target = Audience Affinity: 1.80x Affinity: 1.55x Affinity: 1.62x Bayesian Model

  46. Cool! How do you know this is accurate?

  47. Evaluation Techniques

  48. Binary Classifier Evaluation Ground Truth ? Evaluation techniques Projected Users Probabilities Bayesian Model

  49. Validate via Common Entities common entities Target OR = Audience X X Source Users Destination Users Ground X X X Truth Exact Query Replica X X Projected OR = Audience X

  50. Validate via Self Reconstruction Within the Same Domain destination-only entities source-only entities common entities Target = Audience X X X X X X Source Users Destination Users X X X X X X X X X X Ground X X X Truth

  51. Validate via Double-step Reconstruction PROJECTION PROJECTION Ground Predicted Truth probabilities

  52. Repeat Test Cases Stratifying by Category

  53. Demographics Skewness PROJECTION

  54. Golden Benchmarks Comparison on Aggregated Insights

  55. Opportunities

  56. Many Linked Views of the Same Global Population Audience Projection

  57. Multiple Perspectives Reinforce Reliability Interacted with Game Social Panel Informer social page Affinity: 2.17x Target = Have you read any Game Audience Informer issue? Affinity: 1.73x Game Informer Single Issue Magazine purchased online Affinity: 2.51x

  58. Generalize Audience Projection as a Domain Adaptation Problem

  59. Final Remarks

  60. Many Datasets but only Partial Views

  61. Look-alike fusions don’t scale well

  62. Audience Projection adapts to any “entity domain” Bayesian Model

  63. Accuracy and Biases can be quantified

  64. Strategists now have a complete view of their Target Audience

  65. Gianmario Spacagna Chief Scientist at Helixa.ai gspacagna@helixa.ai @gm_spacagna

Recommend


More recommend