Product2Vec : MRNe Net-Pr A A Multi ti-task Recurrent Ne Neural - PowerPoint PPT Presentation

Product2Vec : MRNe Net-Pr A A Multi ti-task Recurrent Ne Neural Ne Network for Product Embedding Embeddings Arijit Biswas, Mukul Bhutani and Subhajit Sanyal Machine Learning, Amazon, Bangalore, India {barijit,mbhutani,subhajs}@amazon.com

Th The Collaborators Mukul Bhutani Subhajit Sanyal Machine Learning, Machine Learning, Amazon Amazon

A A Produc duct in n an n E-co commerce Company Product attributes Title • Color • Size • • Material Category • Item Type • Hazardous indicator • Batteries required • • High Value Target Gender • Weight • Offer • Review • • Price View Count •

Mo Motivation • Billions of products in the inventory • Diverse set of ML problems involving products • Product recommendation • Duplicate Product Detection • Product Safety Classification • Price Estimation • .... • Any ML application needs a good set of features • What is a good and useful featurization for products?

A A Naïve Fe Featurization • Bag-of-words: TF-IDF representations • Title • Description • Bullet Points etc. • Although effective, often difficult to use in practice: • Overfitting • Computational and Storage Inefficient • Not Semantically Meaningful • Increases the parameters in down-stream ML algorithms • Dense Low-dimensional Features could alleviate these issues

Su Summa mmary of Co Contri ributions • We propose a novel product representation approach • Dense, Low-dimensional, Generic • As good as TF-IDF representation • A Discriminative Multi-task Neural Network is trained • Different signals pertaining to a product are explicitly injected • Static : color, material, weight, size, sub-category • Dynamic : price, popularity, views • The learned representations should be generic • The title of a product is fed into a bidirectional LSTM • Hidden representation is “product embedding” or “product feature” • Training: Embedding is fed to multiple classification/regression/decoding units • Trained Jointly • Referred as Multi-task Recurrent Neural Network (MRNet)

Pr Prior Work • Word/Document Embeddings • Word2Vec [Mikolov, 2013] • Paragraph2Vec/Doc2Vec [Mikolov, 2014] • Product Embeddings • Prod2Vec [Grbovic, KDD 2015] • Meta-Prod2Vec [Vasile, Recsys 2016] • Designed for product recommendation • Traditionally, Multi-task Learning is used for correlated tasks • We use multi-task learning to make the product representations generic!

MR MRNet Decoding Regression Classification Our Approa Classification • Different product signals are injected into MRNet • To make the embedding generic Task 1 Task 2 Task 3 Task 4 Task 5 Tasks Classification Regression Decoding Embedding Layer (Product representation) Static Color, Size, Weight Tf-IDF Material,Category, representation of Item Type, Title Hazardous, High- (5000 dim.) value,Target Gender, Dynamic Offers, Reviews Price, # Views Bi-directional LSTM Word Word Word Word 1 2 T 3 Input words from Product Title

Lo Loss and Optimi mization

Lo Loss and Optimi mization Joint Optimization • Gradient is computed w.r.t full loss • Alternating Optimization • Randomly one task loss is selected • Backpropagation is performed with that loss • Only the weights of that task and task-invariant layers are updated •

Pr Product Group Agnostic Em Embe beddi ddings ngs Products organized as Product Groups (PGs): PG 1 PG 2 PG N • Furniture, Jewelry, Books, Home, Clothes etc. Fully connected linkages Signals are often product group specific: GL agnostic embedding • Weights of Home items are different from (sparsity enforced) Jewelry Fully connected linkages • Sizes of clothes (XL, XXL etc.) are different from furniture (king, queen) PG 1 PG 2 PG N • Embeddings are learned for each product group • A sparse Autoencoder is used to obtain PG- Embedding specific to PG1 agnostic embedding

Da Datas asets Plugs : If a product has an electrical plug or not • Binary, 205K samples • SIOC : If a product ships in it’s own container • Binary, 296K samples • Browse Category classification • Multi-class, 150K samples • Ingestible Classification • Binary, 1500 samples • SIOC (unseen population) • Binary, 150K training and 271 test samples •

Expe Experimental Resul sults s Baseline: TF-IDF-LR Proposed MRNet is comparable to TF-IDF-LR in most scenarios!

Qua ualitative e res esul ults

La Language Agnostic MR MRNet-Pr Product2Vec Products from different marketplaces have their metadata in the language Embedding: UK Embedding: FR native to that region. Hidden Layer We train a multi-modal Autoencoder to link representations of products pertaining to different marketplaces. Embedding: UK Embedding: FR Training Data Split 1/3 input: [Embedding:UK, Embedding:FR] Output:[Embedding:UK, Embedding:FR] 1/3 input: [Embedding:UK, (0,0,0,…..,0)] Output:[(0,0,0,...,0) Embedding:FR] 1/3 input: [(0,0,0,...,0), Embedding:FR] Output:[Embedding:UK,(0,0,0,...,0)]

Qua ualitative e Res esul ults (Langua nguage e Agno Agnostic) Nearest neighbors of French products in UK marketplace.

Co Conclusi sion and Future Work rk Propose a method for generic e-commerce product representation • Inject various product signals into it’s embedding • Comparable results w.r.t sparse and high-dimensional baseline • Product group agnostic embeddings • Language agnostic embeddings • Incorporate more signals: more generic • Include product image information •