Large-scale Product Categorization with Deep Models in Rakuten - PowerPoint PPT Presentation

Large-scale Product Categorization with Deep Models in Rakuten May/8/2017 Ali Cevahir / Denis Miller Rakuten Institute of Technology / Rakuten, Inc. https://rit.rakuten.co.jp / https://global.rakuten.com

About Rakuten https://global.rakuten.com/corp/about/strength/data.html 2

Rakuten Group Services E-Commerce FinTech Digital Content Travel & Reservation Pro Sports Others 3 https://global.rakuten.com/corp/about/business/internet.html

Rakuten Ichiba EC Branding Consulting Marketing Shoppers Merchants Online Market Place over 230,000,000 items in 30,000+ categories 4

Problem and Solution 5

Introduction • Problem: Given product information, automatically classify it to its correct category MACPHEE( マカフィー ) 切り替え V ネックニット Ladies Fashion  Tops  Knit Sweaters  Long Sleeves  V Neck 6

Proposed Solutions • 2 different models – Deep Belief Nets – Deep Auotoencoders + kNN • 2 different data sources – Titles – Descriptions • Overall results aggregated • GPU Implementation 7

Proposed Solutions • 2-step classification – First classify to Level-1 categories – Then, to leaf levels MACPHEE( マカフィー ) • 81% match with merchants 切り替え V ネックニット (‘others’ excluded) – Merchants are not always correct Ladies Fashion  Tops  Knit Sweaters  Long Sleeves  V Neck 8

CUDeep: A CUDA-based Deep Learning Framework • In-house command-line tool for training DBN and DAE • Written with CUDA, using cuBlas and cuSparse 9

CUDeep: A CUDA-based Deep Learning Framework • Deep Belief Nets Supervised X Y’ Y Class probabilities vs Input features (~1 million dim.) Billions of connections!!! • Deep Autoencoders X’ X Semantic hash 10

CUDeep: A CUDA-based Deep Learning Framework • Selective Reconstruction (Dauphin et. al, 2011) • Applied for both – Layer-wise training – Backpropagation 11

CUDeep: Some Design Decisions • Keep neural net weights on GPU W[vis,hid 1 ] = 4 GB – Faster : No need to communicate weights btw CPU and GPU – Alternative: store weights on main memory, copy weights to be updated to GPU for each minibatch 1M • Sparse input feature vectors are stored on main memory – Limited device memory – Disk streaming possible, but slower 1000 12

CUDeep: Some Design Decisions During layer-wise pre-training : 51.2GB ( 64-d ) • Do not store intermediate ( 1000-d ) 800 GB outputs of hidden layers • Do feedforward computations instead ( 2000-d ) 1.6 TB • Intermediate outputs are dense – Not practical to store 200 Million sparse inputs 8 GB (10 nonzero / feature) 13

CUDA-kNN • Vector search engine 14

CUDA-kNN • Preprocessing: Multi-level k-means clustering 1 • 2-step search 1. Closest-cluster search 2. kNN in the closest cluster 2 15

2-Step Classification Level 1: • Step-1: 2 DBN & kNN 35 Categories • Step 2: 2x35 DBN & kNN • 2 DAE models – Same encoding for step 1 and step 2 Level 5: ~30,000 Categories 16

Feature Extraction • Features: 0-1 word vectors • Mostly Japanese text • Normalize letters: ｱｲﾌｫﾝ４ S  アイフォン 4s • Cleaning all html tags: <a href> link </a>  link • Regular expressions for: – Product codes: iPhone-4S → iphone4s – Japanese counters: 4 枚 (do not tokenize) – Sizes and dimensions: 12Cm x 3 Cm → 12cmx3cm 17

Feature Extraction • Titles: 26M tokens • Descr: 47M tokens Total dictionary size: 26M • Use only 1M most- frequent tokens – Good enough for L1 classification – Less tokens exist in Total dictionary size: 800K subcategories for L2 classification 18

Dataset Properties and Hardware Setup • 280 million (active and inactive) products – Rakuten Data Release (https://rit.rakuten.co.jp/opendata.html) • Deduped by titles: 280 million → 172 million – Merchants may sell the same items • 28,338 active categories – ~40% of products are assigned to leaf categories named “ others ” • 90% of randomly selected products used for training • A Linux server with 4 TitanX GPUs • 2 x12-core Intel CPUs • 96 GB main memory 19

Level-1 Genre Prediction Results (Step 1) Includes “others” categories Excludes “others” categories L1 Prediction - with others(Percent Recall @ N) % L1 Prediction - without others(Percent Recall @ N) % 100 100 98 98 96 96 94 94 92 92 90 90 88 88 86 86 84 84 82 82 80 80 78 78 Title-DBN Description-DBN Title-DBN Description-DBN 76 76 Title-KNN Description-KNN Title-KNN Description-KNN 74 74 Combined Combined 72 72 70 70 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Top N predictions Top N predictions 20

Overall Taxonomy Matching (Step 2) Includes “others” categories Excludes “others” categories L5 Prediction - with others (Percent Recall @ N ) L5 Prediction - without others (Percent Recall @ N ) % % 100 100 95 95 90 90 85 85 80 80 75 75 70 70 65 65 60 60 Title-DBN Description-DBN Title-DBN Description-DBN Title-KNN Description-KNN Title-KNN Description-KNN 55 55 Combined Combined DBNs combined 50 50 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Top N predictions Top N predictions 21

Sample Results Merchant Correct / Algorithm Incorrect Sweet Mother - Isaac Andrews Merchant Category: Books, Magazines & Comics > Western Books > Books For Kids Predicted Category: Books, Magazines & Comics > Western Books > Fiction & Literature 22

Sample Results Merchant Incorrect / Algorithm Correct トヨトミ［ KS-67H ］電子火式流型石油ストブ KS67H Merchant Category: Flowers, Garden & DIY > DIY & Tools > Others Predicted Category: Consumer electronics > Seasonal home appliances > Heating > Oilstove > 14+ tatami (wooden) , 19+ tatami (rebar) 23

Sample Results Merchant and Algorithm are Both Correct レンタル【 RG87 】袴フルセット / 大学生 / 小学生 / 高校生 / 中学生 Merchant Category: Women’s Fashion > Japanese > Kimono > Hakama Predicted Category: Women’s Fashion > Japanese > Rental 24

Summary • Large-scale product categorization • A multi-modal deep learning approach • CUDA-based tools: CUDeep, CUDA-kNN • Noisy data, high matching with manual labeling • Engineering challenges – Large data – Dynamic data: products and categories keep changing – Not easy to replicate research output with these settings 25

Engineering Work  Architecture  Tuning for different GPU cards  Dealing with large data set  Improving prediction accuracy  Future work 26

System architecture • Designed to have high scalability and availability • Support requests of both single and multiple input data • Based on Docker. Used nvidia-docker for GPU-based components https://github.com/NVIDIA/nvidia-docker 27

Classification data flow diagram 28

PROBLEMS & SOLUTIONS 29

GPU memory size difference Research environment Production environment Titan X Tesla K80 > 12,287 MiB 11,519 MiB 768 MiB loss 30

GPU memory size difference 900K 1K 2K N Different memory size requires a series of experiments to find new model configuration • Reduce input layer size e.g. from 1M to 900K, with sacrificing information Will use latest GPU with more GPU memory to recover this information loss in future work 31

Extra large data amount • 200 GB of raw data • 260 GB of tokenized items 230 • 200+ GB of 70+ model files • 4 days preparing training data • More than one week to train the models using single server with 2 million items Tesla K80 cards • Extremely large memory usage during training and classification 32

Extra large data amount • Issue – File operations and data processing has high time consumption • Solution – Multiprocessing everywhere – High-speed storage 33

Accuracy worse than experiment • Research shows the result of 74% 74% accuracy rate and up to 88% in some categories • After first building the models from latest data, accuracy is only 51% 51% • Further investigations shown some few significant defects. 34

Shuffling input data Additional process • Issue to shuffle data – Due to the high correlation of sample data, this can result in biased gradient and lead to poor convergence • Solution – Add shuffling process into the data preparation Input data preprocessing 35

Tuning training parameter • Issue – Trained models with latest data resulted in low accuracy result • Lower input layer size • Unbalanced item distribution in categories • Solution – Increase number of backpropagation epochs in 2.5 times and decrease bias multiplier in 10 times 36

Grouping of categories • Issue – Low prediction accuracy for similar categories when separating models • Solution – Group similar categories 37

Accuracy improvement result • Recover expected result – 80% of overall accuracy 80% ~ 98% – 98% in popular categories • Cost several months of work 51% 38

Most successful categories 39

FUTURE WORK 40

Next steps 80% Need to improve the accuracy as much as possible • Data analysis • New experiments is not enough 41

Large-scale Product Categorization with Deep Models in Rakuten - PowerPoint PPT Presentation

Large-scale Product Categorization with Deep Models in Rakuten May/8/2017 Ali Cevahir / Denis Miller Rakuten Institute of Technology / Rakuten, Inc. https://rit.rakuten.co.jp / https://global.rakuten.com About Rakuten

Categorization Categorization is the basis of structure and meaning in our world. We

Text Categorization (I) Luo Si Department of Computer Science Purdue University Text

Product Section Product Section New Product Introduction New Product Introduction Product

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Automatic Categorization of Query Results SIGMOD 04 . Kaushik Chakrabarti 1 S. Surajit

Computer Vision Exercise Session 10 Image Categorization Object Categorization Task

Large Margin Taxonomy Embedding with an Application to Document Categorization K. Weinberger and

CS473 CS-473 Text Categorization (II) Luo Si Department of Computer Science Purdue University

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

INFRASTRUCTURE 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2

McQuay Product Presentation McQuay Product Presentation McQuay Product Presentation TI TLE :

Horizontal Water Source Heat Pump Horizontal Water Source Heat Pump Product Training Product

An Empirical Comparison of Text Categorization Methods Ana Cardoso-Cachopo and Arlindo L.

Inductive Learning Algorithms and Representations for Text Categorization David Heckerman Susan

(Semi-) Automatic Categorization of Natural Language Requirements Eric Knauss and Daniel Ott

L I V I N G U P OSU MARKET NAGOYA - JAPAN Supervisor LUCA MARIA FRANCESCO FABRIS

Oranda-jima Project Yamada-machi Oranda-jima Project Yamada-machi March10th 2011 Yamada-machi

CSR LIMITED RESULTS PRESENTATION FULL YEAR ENDED 31 MARCH 2009 AGENDA Introduction Jerry

Click to edit Master title style DSP3 - The Power of Choice Role of Pricing and Technology 1 19

Draft Vision Plan Presentation June 4, 2010 Preliminary Vision Plan BoRit CAG Future Uses Group

Pro-Equity RMNCH Programming: Experiences from Bangladesh & Honduras David Shanklin Tanvi

TEAPAC MI LLENI UM EDI TI ON An Enterprise Software Solution for the Tea Industry RDG Teapac -

Se September tember 20 2017 SRI LANKA at a t a glance ce USD 81.3 Billion (2016) GDP pe

Large-scale Product Categorization with Deep Models in Rakuten - PowerPoint PPT Presentation

Large-scale Product Categorization with Deep Models in Rakuten May/8/2017 Ali Cevahir / Denis Miller Rakuten Institute of Technology / Rakuten, Inc. https://rit.rakuten.co.jp / https://global.rakuten.com About Rakuten

Categorization Categorization is the basis of structure and meaning in our world. We

Text Categorization (I) Luo Si Department of Computer Science Purdue University Text

Product Section Product Section New Product Introduction New Product Introduction Product

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Automatic Categorization of Query Results SIGMOD 04 . Kaushik Chakrabarti 1 S. Surajit

Computer Vision Exercise Session 10 Image Categorization Object Categorization Task

Large Margin Taxonomy Embedding with an Application to Document Categorization K. Weinberger and

CS473 CS-473 Text Categorization (II) Luo Si Department of Computer Science Purdue University

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

INFRASTRUCTURE 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2

McQuay Product Presentation McQuay Product Presentation McQuay Product Presentation TI TLE :

Horizontal Water Source Heat Pump Horizontal Water Source Heat Pump Product Training Product

An Empirical Comparison of Text Categorization Methods Ana Cardoso-Cachopo and Arlindo L.

Inductive Learning Algorithms and Representations for Text Categorization David Heckerman Susan

(Semi-) Automatic Categorization of Natural Language Requirements Eric Knauss and Daniel Ott

L I V I N G U P OSU MARKET NAGOYA - JAPAN Supervisor LUCA MARIA FRANCESCO FABRIS

Oranda-jima Project Yamada-machi Oranda-jima Project Yamada-machi March10th 2011 Yamada-machi

CSR LIMITED RESULTS PRESENTATION FULL YEAR ENDED 31 MARCH 2009 AGENDA Introduction Jerry

Click to edit Master title style DSP3 - The Power of Choice Role of Pricing and Technology 1 19

Draft Vision Plan Presentation June 4, 2010 Preliminary Vision Plan BoRit CAG Future Uses Group

Pro-Equity RMNCH Programming: Experiences from Bangladesh &amp; Honduras David Shanklin Tanvi

TEAPAC MI LLENI UM EDI TI ON An Enterprise Software Solution for the Tea Industry RDG Teapac -

Se September tember 20 2017 SRI LANKA at a t a glance ce USD 81.3 Billion (2016) GDP pe

Pro-Equity RMNCH Programming: Experiences from Bangladesh & Honduras David Shanklin Tanvi