opentag open attribute value extraction from product
play

OpenTag : Open Attribute Value Extraction From Product Profiles - PowerPoint PPT Presentation

OpenTag : Open Attribute Value Extraction From Product Profiles Guineng Zheng*, Subhabrata Mukherjee , Xin Luna Dong , FeiFei Li* Amazon.com, *University of Utah product KDD 2018 graph 1 Motivation Alexa , what are the flavors of


  1. OpenTag : Open Attribute Value Extraction From Product Profiles Guineng Zheng*, Subhabrata Mukherjee Δ , Xin Luna Dong Δ , FeiFei Li* Δ Amazon.com, *University of Utah product KDD 2018 graph 1

  2. Motivation Alexa , what are the flavors of nescafe? Nescafe Coffee flavors include caramel, mocha, vanilla, coconut, cappuccino, original/regular, decaf, espresso, and cafe au lait decaf. product KDD 2018 graph 2

  3. Attribute value extraction from product profiles Flavor Brand product KDD 2018 graph 3

  4. Characteristics of Attribute Extraction Open World Assumption Limited semantics, irregular syntax • No Predefined Attribute Value • Most titles have 10-15 words • New Attribute Value Discovery • Most bullets have 5-6 words • Phrases not Sentences • Lack of regular grammatical 1. beef flavor structure in titles and bullets 2. lamb flavor • Attribute stacking 3. venison flavor 1. Rachael Ray Nutrish Just 6 Natural Dry Dog Food, Lamb Meal & Brown Rice Recipe 2. Lamb Meal is the #1 Ingredient product KDD 2018 graph 4

  5. Prior Work and Our Contributions Open World No Lexicon, Active Learning Assumption No Hand-crafted Features Ghani et al. 2003, Putthividhya et al. 2011, Ling et al. 2012, Petrovski et al. 2017 Huang et al. 2015, Kozareva et al. 2016 Kozareva et al. 2016, Lample et al. 2016, Ma et al. 2016 OpenTag (this work) product KDD 2018 graph 5

  6. Outline • Problem Definition • Models • Experiments • Active Learning • Experiments product KDD 2018 graph 6

  7. Recap: Problem Statement Given product profiles (e.g., titles, descriptions, bullets) and a set of attributes: extract values of attributes from profile texts Input Product Profile Output Extractions … Title Description Bullets Flavor Brand CESAR Canine A Delectable Meaty Meal • Filet Mignon 1.filet mignon cesar Cuisine Variety for a Small Canine Looking Flavor; 2.porterhouse canine Pack Filet Mignon for the right food … This • Porterhouse Steak steak cuisine & Porterhouse delicious dog treat Flavor; Steak Dog Food contains tender slices of • CESAR Canine (Two 12-Count meat in gravy and is Cuisine provides Cases) formulated to meet the complete and nutritional needs of small balanced nutrition dogs. … product KDD 2018 graph 7

  8. Attribute Extraction as Sequence Tagging B Beginning of attribute value x={w 1 ,w 2 ,…,w n } input sequence I Inside of attribute value y={t 1 ,t 2 ,…,t n } tagging decision O Outside of attribute value E End of attribute value T U P Flavor Extractions {ranch raise lamb} {beef meal} T U O t 1 t 2 t 3 t 4 t 5 t 6 t 7 y B I O E B I O E B I O E B I O E B I O E B I O E B I O E w 2 w 4 w 5 w 6 w 7 w 1 w 3 T U x P meal & ranch raised lamb recipe beef N I product KDD 2018 8 graph

  9. Outline • Introduction • Models • BiLSTM • BiLSTM + CRF • Attention Mechanism • OpenTag Architecture • Active Learning product KDD 2018 graph 9

  10. OpenTag Architecture product KDD 2018 graph 10

  11. OpenTag Architecture (1/4): Word Embedding Map ‘beef’, ‘chicken’, ‘pork’ to nearby points in Flavor– embedding space product KDD 2018 graph 11

  12. OpenTag Architecture (2/4): Bidirectional LSTM Capture long and short range dependencies in input sequence via forward and backward hidden states product KDD 2018 graph 12

  13. OpenTag Architecture (3/4): CRF • Bi-LSTM captures dependency between token sequences, but not between output tags • Conditional Random Field (CRF) enforces tagging consistency product KDD 2018 graph 13

  14. OpenTag Architecture (4/4): Attention • Focus on important hidden concepts, downweight the rest => attention ! • Attention matrix A to attend to important BiLSTM hidden states (h t ) • α t,tʹ ∈ A captures importance of h t w.r.t. h tʹ • Attention-focused representation l t of token x t given by: product KDD 2018 graph 14

  15. OpenTag Architecture product KDD 2018 graph 15

  16. Experimental Discussions: Datasets product KDD 2018 graph 16

  17. Results Overall, OpenTag obtains high F-score of 82.8% product KDD 2018 graph 17

  18. - Highest improvement in F-score of 5.3% over BiLSTM-CRF for product descriptions - However, less accurate than titles Results product KDD 2018 graph 18

  19. OpenTag discovers new attribute-values not seen during training with 82.4% F-score No overlap in attribute value between train and test splits product KDD 2018 graph 19

  20. Interpretability via Attention product KDD 2018 graph 20

  21. OpenTag achieves better concept clustering Distribution of word vectors before attention Distribution of word vectors after attention product KDD 2018 graph 21

  22. Semantically related words come closer in the embedding space product KDD 2018 graph 22

  23. Outline • Introduction • Models • BiLSTM • BiLSTM + CRF • Attention Mechanism • OpenTag Architecture • Active Learning product KDD 2018 graph 23

  24. Active Learning: Motivation • Annotating training data is expensive and time-consuming • Does not scale to thousands of verticals with hundreds of attributes and thousands of values in each domain product KDD 2018 graph 24

  25. Active Learning (Settles, 2009) • Query selection strategy like uncertainty sampling selects sample with highest uncertainty for annotation • Ignores difficulty in estimating individual tags product KDD 2018 graph 25

  26. Tag Flip as Query Strategy • Simulate a committee of OpenTag learners over multiple epochs • Most informative sample => major disagreement among committee members for tags of its tokens across epochs • Use dropout mechanism for simulating committee of learners duck , fillet mignon and ranch raised lamb flavor B O B E O B I E O B O B O O O O B O Tag flips = 4 Most informative sample has highest tag flips across all the epochs • product KDD 2018 graph 26

  27. Tag Flip (red) better than Uncertainty Sampling (blue) TF v.v. LC on detergent data TF v.v. LC on multi extraction product KDD 2018 graph 27

  28. OpenTag reduces burden of human annotation by 3.3x Learning from scratch on detergent data Learning from scratch on multi extraction OpenTag requires only 500 training samples to obtain > 90% P-R • Active learning brings it down to 150 training samples to match 28 • similar performance product KDD 2018 graph

  29. Production Impact Previous Coverage of OpenTag Increase in Existing Production Coverage (%) Coverage (%) System (%) Attribute_1 23 78 53 Attribute_2 21 72 45 Attribute_3 < 1 56 50 Attribute_4 < 1 49 48 product KDD 2018 graph 29

  30. Summary • OpenTag models open world assumption (OWA), multi-word and multiple attribute value extraction with sequence tagging • Word embeddings + Bi-LSTM + CRF + attention • OpenTag + Active learning reduces burden of human annotation (by 3.3x) • Method of tag flip as query strategy • Interpretability • Better concept clustering, attention heatmap, etc. product KDD 2018 graph 30

  31. Thank you for your attention! Summary • OpenTag models open world assumption (OWA), multi-word and multiple attribute value extraction with sequence tagging • Word embeddings + Bi-LSTM + CRF + attention • OpenTag + Active learning reduces burden of human annotation (by 3.3x) • Method of tag flip as query strategy • Interpretability • Better concept clustering, attention heatmap, etc. product KDD 2018 graph 31

  32. Backup Slides product KDD 2018 graph 32

  33. Word Embedding • Map words co-occurring in a similar context to nearby points in embedding space • Pre-trained embeddings learn single representation for each word • But ‘duck’ as a Flavor should have different embedding than ‘duck’ as a Brand • OpenTag learns word embeddings conditioned on attribute-tags product KDD 2018 graph 33

  34. Bi-directional LSTM • LSTM (Hochreiter, 1997) capture long and short range dependencies between tokens, suitable for modeling token sequences • Bi-directional LSTM’s improve over LSTM’s capturing both forward (f t ) and backward (b t ) states at each timestep ‘t’ • Hidden state h t at each timestep generated as: h t = ! ([b t , f t ]) product KDD 2018 graph 34

  35. Bi-directional LSTM B I O E B I O E B I O E B I O E Cross Entropy Loss Hidden Vector h1 h2 h3 h4 100+100=200 units Backward LSTM b1 b2 b3 b4 100 units Forward LSTM f1 f2 f3 f4 100 units e1 e2 e3 e4 Word Embedding glove embedding 50 w1 w2 w3 w4 Word Index ranch raised beef flavor product KDD 2018 graph 35

  36. Conditional Random Fields (CRF) • Bi-LSTM captures dependency between token sequences, but not between output tags • Likelihood of a token-tag being ‘E’ (end) or ‘I’ (intermediate) increases, if the previous token-tag was ‘I’ (intermediate) • Given an input sequence x = {x 1 ,x 2 , …, x n } with tags y = {y 1 , y 2 , …, y n }: linear-chain CRF models: product KDD 2018 graph

Recommend


More recommend