embeddings twitter
play

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 - PowerPoint PPT Presentation

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an Embedding ? 3 Why Embeddings ? 4 Embeddings Pipeline 5 Whats Next Agenda 1 Team 2 Whats an Embedding ? 3 Why Embeddings ? 4


  1. Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018

  2. Agenda 1 Team 2 Whats an Embedding ? 3 Why Embeddings ? 4 Embeddings Pipeline 5 What’s Next

  3. Agenda 1 Team 2 Whats an Embedding ? 3 Why Embeddings ? 4 Embeddings Pipeline 5 What’s Next

  4. Section Team Cortex

  5. Team To unify and advance recommendation systems.

  6. Team Recommendation Systems

  7. Team Home Explore

  8. Team Email

  9. Team Notifications

  10. Team Twitter

  11. Agenda 1 Team and Product 2 Whats an Embedding ? 3 Why Embeddings ? 4 Embeddings Pipeline 5 What’s Next

  12. What is an Embedding ? Discrete Continuous Space! Model twitter: [ 0.07, -0.001, -0.208 ] @jack: [ 0.427, 0.225, -0.082 ] SF: [ 0.541, 0.496, -0.362 ] #TwitterNBA: [ 0.414, 0.068, -0.196 ] word

  13. Agenda 1 Team and Product 2 Whats an Embedding ? 3 Why Embeddings ? 4 Embeddings Pipeline 5 What’s Next

  14. Why Embeddings ? Model Transfer Feature Nearest Neighbor Features Compression Search Learning Lead to improved model Reduced infrastructure cost and Similarity search on the Knowledge exchange between performance when used as input improved efficiency embedding space related domains while reducing features training time and boosting performance

  15. Why Embeddings ? ML practitioners typically use one-hot encoding ● Model Features to represent categorical inputs Incapable of encoding relationships ○ Sparsity issues make it less useful for large ○ dimensions Embeddings are outputs of ML models ● Conserve relationships amongst entities ○ Compress the sparse input space into ○ dense vectors

  16. Why Embeddings ? Model Features

  17. Why Embeddings ? Feature Compression

  18. Why Embeddings ? Feature Compression

  19. Why Embeddings ? Feature Compression Generate embeddings from a sub-network offline ● Update at the same frequency as the raw features ●

  20. Why Embeddings ? Feature Compression

  21. Why Embeddings ? Nearest Neighbor Search

  22. Why Embeddings ? Nearest Neighbor Search

  23. Why Embeddings ? Essential component for Candidate Generation ● pipelines Co-embed users and items ○ Given a user, lookup neighbors ○ Use approximate methods to scale ○ Nearest Neighbor Search Finds application in many other areas ●

  24. Why Embeddings ? Model trained for one task is used in another ● Typically by initializing network weights and fine ○ tuning Transfer Very attractive from a business point of view ● Learning Reduced development time ○ Cross domain information sharing ○

  25. Agenda 1 Team and Product 2 Whats an Embedding ? 3 Why Embeddings ? 4 Embeddings Pipeline 5 What’s Next

  26. Embedding pipeline Goals Creation & Quality and Sharing & consumption with Relevance discoverability ease Enable adapting to Enable cross team Enable teams to learn evolving data distributions embeddings at scale using collaboration over time the appropriate algorithm Improvements/learning in If applicable the learnt one domain can drive Enable teams to consume embeddings should be of embeddings at scale improvements elsewhere value across product ML models

  27. Embedding pipeline Item Selection & Data Preprocessing Identify the set of entities to learn embeddings for ● Assemble dataset that represents the relationships ● between these entities Data representation defined by the learning ○ algorithm

  28. Embedding pipeline Model Fitting Fit a model on the collected data ● Use pre-built algorithms ○ Option to plug in a custom algorithm ○

  29. Embedding pipeline Developed a variety of standard benchmarking ● tasks for each type of embedding Benchmarking User Topic Prediction: ○ Predictive performance of a logistic regression model learnt on the users embedding.

  30. Embedding pipeline Developed a variety of standard benchmarking ● tasks for each type of embedding Benchmarking User metadata prediction : ○ Predictive performance of a logistic regression model learnt on the users embedding.

  31. Embedding pipeline Developed a variety of standard benchmarking ● tasks for each type of embedding Benchmarking User Follow Jaccard: ○ Jaccard index of the users’ embedding similarity and their follow sets'

  32. Embedding pipeline Feature Store Publish embeddings to the "feature store", ● Twitter's shared feature repository Enables ML teams throughout Twitter to easily ● discover, access, and utilize freshly trained embeddings. Easy offline & online access ○ Discovery through UX ○

  33. Agenda 1 Team and Product 2 Whats an Embedding ? 3 Why Embeddings ? 4 Embeddings Pipeline 5 What’s Next

  34. Whats Next ? New embedding learning algorithms ● Increasing number of datasets available as embeddings ● Large scale approximate nearest neighbor (ANN) solution ● Further exploration with embeddings as means for feature compression ●

  35. @tayal_abhishek Thank you September, 2018

  36. Section Abhishek Tayal @tayal_abhishek We are Hiring !!! #TwitterCortex #MLX 00 5k 10k 09 Sep 2018

Recommend


More recommend