Deep Learning for Semantic Search in E-commerce Somnath Banerjee - - PowerPoint PPT Presentation

deep learning for semantic search in e commerce
SMART_READER_LITE
LIVE PREVIEW

Deep Learning for Semantic Search in E-commerce Somnath Banerjee - - PowerPoint PPT Presentation

Deep Learning for Semantic Search in E-commerce Somnath Banerjee Head of Search Algorithms at Walmart Labs https://www.linkedin.com/in/somnath-banerjee/ March 19, 2019 Walmart E-commerce search problem E-commerce Search Store Associate


slide-1
SLIDE 1

Deep Learning for Semantic Search in E-commerce

Somnath Banerjee Head of Search Algorithms at Walmart Labs https://www.linkedin.com/in/somnath-banerjee/

March 19, 2019

slide-2
SLIDE 2

Walmart E-commerce search problem

2

100M+ Customers 100M+ Queries 100M+ Items Store Associate E-commerce Search

provides the functionality of a human but at scale

slide-3
SLIDE 3

3

????

Flash Drive USB Drive Thumb Drive Jump Drive Pen Drive Zip Drive Memory Stick USB Stick USB Flash Drive USB Memory USB Storage Device Flush Drive USC Drive Thamb Drive Jmp Drive Pin Drive Zap Drive Memory Steak USB Stock USB Flash Drve Miss isspelled Qu Queries

slide-4
SLIDE 4

4

saddle

Horse Saddle Bike Saddle

slide-5
SLIDE 5

Outline

5

  • Core problems of e-commerce search
  • Semantic search in e-commerce
  • Deep Learning for semantic search

– Query classification – Query token tagging – Neural IR – Image understanding (sneak peek)

slide-6
SLIDE 6

Core of E-commerce Search

6

Text Query Text Query Find Items Catalog

slide-7
SLIDE 7

Core problems of E-commerce search

7

Learning book Tide 100 oz Tide 100 fl oz Tide 100 ounce Neck style? Fabric?

  • No. of pockets?

Ziploc Ambiguity Missing catalog values Levi’s Levi Strauss Signature by Levi Strauss and Co. Open vocabulary in query and catalog

slide-8
SLIDE 8

Buying decision is influenced by item attractiveness

8

pump shoes Tags

$300!!!

Presence of expensive items Image quality

slide-9
SLIDE 9

Matching query to items Ranking items

Core technical problems of e-commerce search

9

Pump shoes

✅ ✅ ❌

Position 1 Position 2

slide-10
SLIDE 10

Text matching is not enough

10

Lemon

Lemon Balm Lemon Fruit

Nivea 16oz

Nivea 15.5oz Tire sealant 16oz

slide-11
SLIDE 11

Query understanding

  • Attribute understanding

Matching query and item

  • Text matching
  • Attribute matching

Ranking Items

Sematic Search

11

slide-12
SLIDE 12

Deep learning for semantic search

12

Deep Learning for Query understanding Matching query and item

  • Text matching
  • Attribute matching

Ranking Items

slide-13
SLIDE 13

Deep learning for semantic search

13

Deep Learning for Query understanding Matching query and item

  • Text matching
  • Attribute matching

Ranking Items

Neural IR

End-to-end matching and ranking

Image understanding

Not just text search

slide-14
SLIDE 14

Outline

14

  • Core problems of e-commerce search
  • Semantic search in e-commerce
  • Deep Learning for semantic search

– Query classification – Query token tagging – Neural IR – Image understanding (sneak peek)

slide-15
SLIDE 15

Query Classification

15

Text query

product type 1 : confidence level product type 2 : confidence level product type 3 : confidence level

Product Type

  • A predefined list
  • Indicates a specific product in the catalog
  • Every item in the catalog is tagged with a product type
slide-16
SLIDE 16

Query classification examples

16

Computer Video Cards : 0.85 Laptop Computers: 0.08 Desktop Computers: 0.06

nvidia gpu

Food Storage Bags: 1.0

ziploc bags

Large number of product types

bedroom furniture

Hard to balance precision vs recall

slide-17
SLIDE 17

Query classification challenges

17

Short text

  • Queries are of 2-3

tokens Large scale classification

  • Thousands of

product types (classes) Multi-class, multi- label problem

  • Same query can

have multiple product types Needs to respond in few milliseconds

  • Classifies queries

at runtime Unbalanced class distribution

  • Some product

types are much more popular

slide-18
SLIDE 18

Data and Model

18

BiLSTM <query, product type

  • rdered>

<query, item ordered> Historical Search Log

https://guillaumegenthial.github.io/sequence-tagging-with-tensorflow.html

Output Layer

word2vec

Softmax/sigmoid

slide-19
SLIDE 19

Usage of query classification

19

Without Query Classification After we understand the query “lemon” as a fruit

lemon

20% reduction of irrelevant items in certain query segments

slide-20
SLIDE 20

Key Learnings

20

Logistic Regression Deep Learning

6% higher accuracy

CNN BiLSTM

More accurate

1 K80 GPU 8 Core CPU

6X faster

1 K80 GPU 48 Core CPU

Equal

Accuracy Training Time

slide-21
SLIDE 21

Key Learnings - instability in Prediction

21

Television Stands : 0.32; Laptop Computers : 0.27 Hard Drives : 0.11 Hard Drives : 1.00

Old model New model

samsung 850 evo 250gb 2.5 inch

slide-22
SLIDE 22

Instability in Prediction

22

Training data N Training data (N + 1) 2.5% difference in training data Model N Model (N + 1) Top predicted class is different for 10% of the test set

slide-23
SLIDE 23

Instability in prediction – different seeds

23

Training data N Model N Model N’ Top predicted class is different for 7% of the test set Seed 1 Seed 2

Different tensorflow and numpy seeds

slide-24
SLIDE 24

Sources of Instability

24

Overfitting

  • Deep Learning model has high

variance, particularly on the low traffic queries

  • Simpler models could be more

stable but less accurate

Sigmoid (1-vs-all) classifier is more unstable

  • Softmax scores are

interdependent across classes and less stable

Noisy training data

  • Item order data is less noisy

than click

Rounding errors in the arithmetic operations

  • CPU is more stable than GPU
slide-25
SLIDE 25

Reduction of Instability

25

40% reduction

  • f instability

Softmax Clicks CNN Sigmoid Orders BiLSTM

slide-26
SLIDE 26
  • Product Type
  • Brand
  • Color
  • Gender
  • Age Group
  • Size (value & unit)

– Pack Size – Screen Size – Shoe Size – …

  • Character
  • Style
  • Material

Attributes to match

26

Not

  • t Fea

easible le – Sep eparate cla classifie ier for

  • r

ea each attri tribute

  • Too many classes (e.g. 100K+ brand values)
  • Sparse attributes; most attribute prediction

should be NA

  • Creating training data of <query, attribute> is

more noisy and inaccurate

slide-27
SLIDE 27

Query token tagging

27

Query Query tokens tagged with Attribute Names faded glory long sleeve shirts for women Faded Glory Long sleeve shirts women for

Product type Brand Sleeve length Gender NULL

slide-28
SLIDE 28

Training data

28

blue women levis jeans Brand Product Type Gender Color toys for girls 3 – 6 years Age Value Age Unit Gender Product Type

Human curated data It is a hard task for human

  • Is "outside” a product type token in the

query, “canopy tents for outside”?

Disagreement between taggers are high (~30%) Fortunately 10K training data is a good start

slide-29
SLIDE 29

Model – BiLSTM-CRF

29

word2vec Features for CRF

https://guillaumegenthial.github.io/sequence-tagging-with-tensorflow.html

Linear Chain CRF query tokens

𝑄(𝑢𝑏𝑕1, … , 𝑢𝑏𝑕𝑜)

Char embedding

slide-30
SLIDE 30

Char Embeddings

30

G P U

word2vec

BiLSTM-CRF Network Character embedding network

word2vec type learnt on character sequence

slide-31
SLIDE 31
  • Maps a sequence of characters to a fixed size vector
  • Handles out of vocabulary words
  • Handles misspellings

Char Embedding

31

sansung tv sansung tv Brand Product Type NULL Product Type

With Char Embedding Without Char Embedding

slide-32
SLIDE 32

Improving search results using query tagging

32

Women citizen eco drive watch

Before After understanding the Gender token

Reg egex match will ll be be incorrect for

  • r

qu queries s lik ike pioneer women dinnerware wonder women bedding spider man car seats

slide-33
SLIDE 33

Other use cases of query tagging

33

samsung tv 32 in 32 in vizio tv sanyo flat screen tv led tv sony 55” samsung tv stand sony tv remote

TV queries Not TV queries

Customer Demand Analysis

  • Most searched brand of TV

Attribute filter suggestion

  • Suggest top attributes (e.g. brand,

screen size) that customers look for for in a product type query (e.g. TV)

Search query log

slide-34
SLIDE 34

Traditional IR Semantic Search Neural IR

Neural IR

34

Token and synonym match Learning to Rank

  • Attribute extraction
  • Token, synonym and

attribute match

  • Learning to rank

End-to-end matching and ranking

slide-35
SLIDE 35

Neural IR – Design 1

35

Query Item Title

Input Embedding Concatenation Neural Transformation Transformed feature Relevance Score

  • Runtime computation
  • Not scalable for large

number of items

slide-36
SLIDE 36

Neural IR – Design 2

36

Query Item Title

Input Embedding Neural Transformation Query, item embeddings Relevance Score

item embeddings can be computed offline and indexed

shared weights

slide-37
SLIDE 37

Input Embedding

37

Comparable Accuracy

Input Embedding

token 1

token n

Query

  • r

Item Title

… AVG word2vec

token 1

token n

… CNN word2vec

Input Embedding Query

  • r

Item Title

slide-38
SLIDE 38

Training Data

38

query, item title, click through rate (ctr)* Historical search log

*Position bias correction for ctr of a query, item pair 𝑑𝑢𝑠 = σ𝑠 𝑑𝑚𝑗𝑑𝑙𝑡_𝑑𝑝𝑠𝑠𝑓𝑑𝑢𝑓𝑒𝑠 σ𝑠 𝑗𝑛𝑞𝑠𝑓𝑡𝑡𝑗𝑝𝑜𝑡𝑠 𝑑𝑚𝑗𝑑𝑙𝑡_𝑑𝑝𝑠𝑠𝑓𝑑𝑢𝑓𝑒𝑠 = 𝑑𝑚𝑗𝑑𝑙𝑡𝑠 + 𝑗𝑛𝑞𝑠𝑓𝑡𝑡𝑗𝑝𝑜𝑡𝑠 − 𝑑𝑚𝑗𝑑𝑙𝑡𝑠 ∗ 𝑄 𝑑𝑚𝑗𝑑𝑙 𝑠) 𝑠 = 𝑠𝑏𝑜𝑙 𝑏𝑢 𝑥ℎ𝑗𝑑ℎ 𝑢ℎ𝑓 𝑗𝑢𝑓𝑛 𝑥𝑏𝑡 𝑒𝑗𝑡𝑞𝑚𝑏𝑧𝑓𝑒

slide-39
SLIDE 39

Training Loss

39

Point-wise Pair-wise 𝑦𝑟 = 𝑟𝑣𝑓𝑠𝑧 𝑔𝑓𝑏𝑢𝑣𝑠𝑓𝑡 𝑦𝑞 = 𝑗𝑢𝑓𝑛 𝑔𝑓𝑏𝑢𝑣𝑠𝑓𝑡 𝑔 𝑦𝑟, 𝑦𝑞 → ctr Regression problem Sigmoid cross entropy loss

Brooks shoes

𝑦𝑟 𝑦𝑞 𝑦𝑜 relevant less relevant query 𝑔 𝑦𝑟, 𝑦𝑞 > 𝑔 𝑦𝑟, 𝑦𝑜 𝑥ℎ𝑓𝑜 𝑑𝑢𝑠 𝑦𝑟, 𝑦𝑞 > 𝑑𝑢𝑠(𝑦𝑟, 𝑦𝑜) Minimize pair inversions Pair-wise logistic loss

slide-40
SLIDE 40

Accuracy on pair-wise loss

40

NDCG captures quality of overall ranking Pair accuracy captures if higher ctr (relevant) items ranked above the lower ctr items

0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% Design 1 Design 2

NDC DCG@10 lift lift ag against bas baseline

30.00% 31.00% 32.00% 33.00% 34.00% 35.00% 36.00% Design 1 Design 2

Pair air Ac Accuracy lift lift ag against base baseline

slide-41
SLIDE 41

Pros

  • End to end approach
  • Enables Semantic matching implicitly
  • Handles different data types (text, image)

Cons

  • Not scalable (yet)
  • Not so successful (yet)

Neural IR

41

slide-42
SLIDE 42

Image understanding

42

Predicted Attributes

  • Product type
  • Style
  • Material
  • Color

Attribute Prediction Visual Search Compatible Outfit

slide-43
SLIDE 43

Image understanding key learnings

43

  • Multi-task learning is

more accurate

  • Predicting style is

harder than predicting product type

Attribute Prediction Visual Search Compatible Outfit

  • A/B test on hayneedle.com
  • Comparable results against

a well established startup

  • Under exploration
  • Early results beating

token based approach

slide-44
SLIDE 44

Future

44

Evolution of mobile phone

slide-45
SLIDE 45

Future

45

Web Search E-commerce Search

Conversational commerce Seamless search and personalized results V-Commerce

slide-46
SLIDE 46

46