OpenTag : Open Attribute Value Extraction From Product Profiles - PowerPoint PPT Presentation

OpenTag : Open Attribute Value Extraction From Product Profiles Guineng Zheng*, Subhabrata Mukherjee Δ , Xin Luna Dong Δ , FeiFei Li* Δ Amazon.com, *University of Utah product KDD 2018 graph 1

Motivation Alexa , what are the flavors of nescafe? Nescafe Coffee flavors include caramel, mocha, vanilla, coconut, cappuccino, original/regular, decaf, espresso, and cafe au lait decaf. product KDD 2018 graph 2

Attribute value extraction from product profiles Flavor Brand product KDD 2018 graph 3

Characteristics of Attribute Extraction Open World Assumption Limited semantics, irregular syntax • No Predefined Attribute Value • Most titles have 10-15 words • New Attribute Value Discovery • Most bullets have 5-6 words • Phrases not Sentences • Lack of regular grammatical 1. beef flavor structure in titles and bullets 2. lamb flavor • Attribute stacking 3. venison flavor 1. Rachael Ray Nutrish Just 6 Natural Dry Dog Food, Lamb Meal & Brown Rice Recipe 2. Lamb Meal is the #1 Ingredient product KDD 2018 graph 4

Prior Work and Our Contributions Open World No Lexicon, Active Learning Assumption No Hand-crafted Features Ghani et al. 2003, Putthividhya et al. 2011, Ling et al. 2012, Petrovski et al. 2017 Huang et al. 2015, Kozareva et al. 2016 Kozareva et al. 2016, Lample et al. 2016, Ma et al. 2016 OpenTag (this work) product KDD 2018 graph 5

Outline • Problem Definition • Models • Experiments • Active Learning • Experiments product KDD 2018 graph 6

Recap: Problem Statement Given product profiles (e.g., titles, descriptions, bullets) and a set of attributes: extract values of attributes from profile texts Input Product Profile Output Extractions … Title Description Bullets Flavor Brand CESAR Canine A Delectable Meaty Meal • Filet Mignon 1.filet mignon cesar Cuisine Variety for a Small Canine Looking Flavor; 2.porterhouse canine Pack Filet Mignon for the right food … This • Porterhouse Steak steak cuisine & Porterhouse delicious dog treat Flavor; Steak Dog Food contains tender slices of • CESAR Canine (Two 12-Count meat in gravy and is Cuisine provides Cases) formulated to meet the complete and nutritional needs of small balanced nutrition dogs. … product KDD 2018 graph 7

Attribute Extraction as Sequence Tagging B Beginning of attribute value x={w 1 ,w 2 ,…,w n } input sequence I Inside of attribute value y={t 1 ,t 2 ,…,t n } tagging decision O Outside of attribute value E End of attribute value T U P Flavor Extractions {ranch raise lamb} {beef meal} T U O t 1 t 2 t 3 t 4 t 5 t 6 t 7 y B I O E B I O E B I O E B I O E B I O E B I O E B I O E w 2 w 4 w 5 w 6 w 7 w 1 w 3 T U x P meal & ranch raised lamb recipe beef N I product KDD 2018 8 graph

Outline • Introduction • Models • BiLSTM • BiLSTM + CRF • Attention Mechanism • OpenTag Architecture • Active Learning product KDD 2018 graph 9

OpenTag Architecture product KDD 2018 graph 10

OpenTag Architecture (1/4): Word Embedding Map ‘beef’, ‘chicken’, ‘pork’ to nearby points in Flavor– embedding space product KDD 2018 graph 11

OpenTag Architecture (2/4): Bidirectional LSTM Capture long and short range dependencies in input sequence via forward and backward hidden states product KDD 2018 graph 12

OpenTag Architecture (3/4): CRF • Bi-LSTM captures dependency between token sequences, but not between output tags • Conditional Random Field (CRF) enforces tagging consistency product KDD 2018 graph 13

OpenTag Architecture (4/4): Attention • Focus on important hidden concepts, downweight the rest => attention ! • Attention matrix A to attend to important BiLSTM hidden states (h t ) • α t,tʹ ∈ A captures importance of h t w.r.t. h tʹ • Attention-focused representation l t of token x t given by: product KDD 2018 graph 14

OpenTag Architecture product KDD 2018 graph 15

Experimental Discussions: Datasets product KDD 2018 graph 16

Results Overall, OpenTag obtains high F-score of 82.8% product KDD 2018 graph 17

- Highest improvement in F-score of 5.3% over BiLSTM-CRF for product descriptions - However, less accurate than titles Results product KDD 2018 graph 18

OpenTag discovers new attribute-values not seen during training with 82.4% F-score No overlap in attribute value between train and test splits product KDD 2018 graph 19

Interpretability via Attention product KDD 2018 graph 20

OpenTag achieves better concept clustering Distribution of word vectors before attention Distribution of word vectors after attention product KDD 2018 graph 21

Semantically related words come closer in the embedding space product KDD 2018 graph 22

Outline • Introduction • Models • BiLSTM • BiLSTM + CRF • Attention Mechanism • OpenTag Architecture • Active Learning product KDD 2018 graph 23

Active Learning: Motivation • Annotating training data is expensive and time-consuming • Does not scale to thousands of verticals with hundreds of attributes and thousands of values in each domain product KDD 2018 graph 24

Active Learning (Settles, 2009) • Query selection strategy like uncertainty sampling selects sample with highest uncertainty for annotation • Ignores difficulty in estimating individual tags product KDD 2018 graph 25

Tag Flip as Query Strategy • Simulate a committee of OpenTag learners over multiple epochs • Most informative sample => major disagreement among committee members for tags of its tokens across epochs • Use dropout mechanism for simulating committee of learners duck , fillet mignon and ranch raised lamb flavor B O B E O B I E O B O B O O O O B O Tag flips = 4 Most informative sample has highest tag flips across all the epochs • product KDD 2018 graph 26

Tag Flip (red) better than Uncertainty Sampling (blue) TF v.v. LC on detergent data TF v.v. LC on multi extraction product KDD 2018 graph 27

OpenTag reduces burden of human annotation by 3.3x Learning from scratch on detergent data Learning from scratch on multi extraction OpenTag requires only 500 training samples to obtain > 90% P-R • Active learning brings it down to 150 training samples to match 28 • similar performance product KDD 2018 graph

Production Impact Previous Coverage of OpenTag Increase in Existing Production Coverage (%) Coverage (%) System (%) Attribute_1 23 78 53 Attribute_2 21 72 45 Attribute_3 < 1 56 50 Attribute_4 < 1 49 48 product KDD 2018 graph 29

Summary • OpenTag models open world assumption (OWA), multi-word and multiple attribute value extraction with sequence tagging • Word embeddings + Bi-LSTM + CRF + attention • OpenTag + Active learning reduces burden of human annotation (by 3.3x) • Method of tag flip as query strategy • Interpretability • Better concept clustering, attention heatmap, etc. product KDD 2018 graph 30

Thank you for your attention! Summary • OpenTag models open world assumption (OWA), multi-word and multiple attribute value extraction with sequence tagging • Word embeddings + Bi-LSTM + CRF + attention • OpenTag + Active learning reduces burden of human annotation (by 3.3x) • Method of tag flip as query strategy • Interpretability • Better concept clustering, attention heatmap, etc. product KDD 2018 graph 31

Backup Slides product KDD 2018 graph 32

Word Embedding • Map words co-occurring in a similar context to nearby points in embedding space • Pre-trained embeddings learn single representation for each word • But ‘duck’ as a Flavor should have different embedding than ‘duck’ as a Brand • OpenTag learns word embeddings conditioned on attribute-tags product KDD 2018 graph 33

Bi-directional LSTM • LSTM (Hochreiter, 1997) capture long and short range dependencies between tokens, suitable for modeling token sequences • Bi-directional LSTM’s improve over LSTM’s capturing both forward (f t ) and backward (b t ) states at each timestep ‘t’ • Hidden state h t at each timestep generated as: h t = ! ([b t , f t ]) product KDD 2018 graph 34

Bi-directional LSTM B I O E B I O E B I O E B I O E Cross Entropy Loss Hidden Vector h1 h2 h3 h4 100+100=200 units Backward LSTM b1 b2 b3 b4 100 units Forward LSTM f1 f2 f3 f4 100 units e1 e2 e3 e4 Word Embedding glove embedding 50 w1 w2 w3 w4 Word Index ranch raised beef flavor product KDD 2018 graph 35

Conditional Random Fields (CRF) • Bi-LSTM captures dependency between token sequences, but not between output tags • Likelihood of a token-tag being ‘E’ (end) or ‘I’ (intermediate) increases, if the previous token-tag was ‘I’ (intermediate) • Given an input sequence x = {x 1 ,x 2 , …, x n } with tags y = {y 1 , y 2 , …, y n }: linear-chain CRF models: product KDD 2018 graph

OpenTag : Open Attribute Value Extraction From Product Profiles - PowerPoint PPT Presentation

OpenTag : Open Attribute Value Extraction From Product Profiles Guineng Zheng, Subhabrata Mukherjee , Xin Luna Dong , FeiFei Li Amazon.com, *University of Utah product KDD 2018 graph 1 Motivation Alexa , what are the flavors of

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Product Section Product Section New Product Introduction New Product Introduction Product

Attribute Grammars Wilhelm/Seidl/Hack: Compiler Design, Syntactic and Semantic Analysis

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

CORE: Context-Aware Open Relation Extraction with Factorization Machines Fabio Petroni Luciano

Proposal to add masking function to GFM Proposal part 1 Adding a masking reference attribute on

Decorated Attribute Grammars Attribute Evaluation Meets Strategic Programming CC 2009, York, UK

Attribute Dependencies Wilhelm/Seidl/Hack: Compiler Design, Syntactic and Semantic Analysis

The Syntax of Classes and Objects in Python Defining a Class - "Inventing a Composite Data

Why attribute-based signatures? The kind of authentication required in an attribute-based system

Incorporating Off-Line Attribute Delegation into Hierarchical Group and Attribute-Based Access

Transforming a continuous attribute into a discrete (ordinal) attribute Ricco RAKOTOMALALA

McQuay Product Presentation McQuay Product Presentation McQuay Product Presentation TI TLE :

Horizontal Water Source Heat Pump Horizontal Water Source Heat Pump Product Training Product

ACR122T USB Token NFC Reader A Product Presentation www.acs.com.hk Rundown 1. Product

Working w ith more than one time series VISU AL IZIN G TIME SE R IE S DATA IN P YTH ON Thomas

The Panel Nicole Day Director of Programming, Communications, CADE nicole@cadefarms.org Chris

Progressive Interaction for Autonomous Entity Matching Ben McCamish, Arash Termehchy Oregon State

Intro to BeEF Chad Hollman Analyst, County of Sacramento Department of Technology What is BeEF?

GROUND CUTTING, DEVOURING AND DIGESTING THE LEGS OFF A BROWSER MICHELE ANTISNATCHOR

More Examples Relative Prices (RP)? Consumers make decisions based on relative prices rather

Markov decision process: Case example Optimal management of replacement heifers in beef herd

Wel elcome come Agenda 8:20 8:30 - Introduction 8:30 9:00 Overview of the US

OpenTag : Open Attribute Value Extraction From Product Profiles - PowerPoint PPT Presentation

OpenTag : Open Attribute Value Extraction From Product Profiles Guineng Zheng*, Subhabrata Mukherjee , Xin Luna Dong , FeiFei Li* Amazon.com, *University of Utah product KDD 2018 graph 1 Motivation Alexa , what are the flavors of

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Product Section Product Section New Product Introduction New Product Introduction Product

Attribute Grammars Wilhelm/Seidl/Hack: Compiler Design, Syntactic and Semantic Analysis

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

CORE: Context-Aware Open Relation Extraction with Factorization Machines Fabio Petroni Luciano

Proposal to add masking function to GFM Proposal part 1 Adding a masking reference attribute on

Decorated Attribute Grammars Attribute Evaluation Meets Strategic Programming CC 2009, York, UK

Attribute Dependencies Wilhelm/Seidl/Hack: Compiler Design, Syntactic and Semantic Analysis

The Syntax of Classes and Objects in Python Defining a Class - &quot;Inventing a Composite Data

Why attribute-based signatures? The kind of authentication required in an attribute-based system

Incorporating Off-Line Attribute Delegation into Hierarchical Group and Attribute-Based Access

Transforming a continuous attribute into a discrete (ordinal) attribute Ricco RAKOTOMALALA

McQuay Product Presentation McQuay Product Presentation McQuay Product Presentation TI TLE :

Horizontal Water Source Heat Pump Horizontal Water Source Heat Pump Product Training Product

ACR122T USB Token NFC Reader A Product Presentation www.acs.com.hk Rundown 1. Product

Working w ith more than one time series VISU AL IZIN G TIME SE R IE S DATA IN P YTH ON Thomas

The Panel Nicole Day Director of Programming, Communications, CADE nicole@cadefarms.org Chris

Progressive Interaction for Autonomous Entity Matching Ben McCamish, Arash Termehchy Oregon State

Intro to BeEF Chad Hollman Analyst, County of Sacramento Department of Technology What is BeEF?

GROUND CUTTING, DEVOURING AND DIGESTING THE LEGS OFF A BROWSER MICHELE ANTISNATCHOR

More Examples Relative Prices (RP)? Consumers make decisions based on relative prices rather

Markov decision process: Case example Optimal management of replacement heifers in beef herd

Wel elcome come Agenda 8:20 8:30 - Introduction 8:30 9:00 Overview of the US

OpenTag : Open Attribute Value Extraction From Product Profiles Guineng Zheng, Subhabrata Mukherjee , Xin Luna Dong , FeiFei Li Amazon.com, *University of Utah product KDD 2018 graph 1 Motivation Alexa , what are the flavors of

The Syntax of Classes and Objects in Python Defining a Class - "Inventing a Composite Data