Recursive Neural Structural Correspondence Network for Cross-domain Aspect and Opinion Co-extraction Wenya Wang †‡ and Sinno Jialin Pan † † Nanyang Technological University, Singapore ‡ SAP Innovation Center Singapore { wa0001ya, sinnopan } @ntu.edu.sg July 18, 2018
Outline Introduction 1 Background Definition & Motivation Overview & Contribution Model Architecture 2 Experiments 3 Conclusion 4
Outline Introduction 1 Background Definition & Motivation Overview & Contribution Model Architecture 2 Experiments 3 Conclusion 4 1 / 19
Background: What is Aspect/Opinion Extraction Fine-grained Opinion Mining Figure 1: An example of review outputs. ◮ Our focus : Aspect and Opinion Terms Co-extraction ◮ Challenge : Limited resources for fine-grained annotations 2 / 19
Background: What is Aspect/Opinion Extraction Fine-grained Opinion Mining Figure 1: An example of review outputs. ◮ Our focus : Aspect and Opinion Terms Co-extraction ◮ Challenge : Limited resources for fine-grained annotations ⇒ Cross-domain extraction 2 / 19
Outline Introduction 1 Background Definition & Motivation Overview & Contribution Model Architecture 2 Experiments 3 Conclusion 4 3 / 19
Problem Definition 1 Task formulation : Sequence labeling Labels N N N N BO B� I� B� { Beginning of aspect Features I� { Inside of aspect � � � 2 � 3 � 4 � 5 � 6 � 7 BO { Beginning of opinion IO { Inside of opinion N { None Input x The phone has a good screen size Figure 2: A deep learning model for sequence labeling. 2 Domain Adaptation ◮ Given : Labeled data in source domain D S = { ( x S i , y S i ) } n S i =1 , unlabeled data in target domain D T = { x T j } n T j =1 ◮ Idea : Build bridges across domains, learn shared space 4 / 19
Motivation: Domain Adaptation 1 Domain shift & bridges Figure 3: Domain shift for different domains. Figure 4: Syntactic patterns. 5 / 19
Motivation: Domain Adaptation 1 Domain shift & bridges Figure 3: Domain shift for different domains. Figure 4: Syntactic patterns. 2 Related work ◮ Adaptive bootstrapping [Li et al., 2012] ◮ Auxiliary task with Recurrent neural network [Ding et al., 2017] 5 / 19
Outline Introduction 1 Background Definition & Motivation Overview & Contribution Model Architecture 2 Experiments 3 Conclusion 4 6 / 19
Overview & Contribution Recursive Neural Structural Correspondence Network (RNSCN) ◮ Structural correspondences are built based on common syntactic structures ◮ Use relation vectors with auxiliary labels to learn a shared space across domains Label denoising auto-encoder ◮ Deal with auxiliary label noise ◮ Group relation vectors into their intrinsic clusters in an unsupervised manner A joint deep model 7 / 19
Outline Introduction 1 Background Definition & Motivation Overview & Contribution Model Architecture 2 Experiments 3 Conclusion 4 8 / 19
Model Architecture: Recursive Neural Network h 2 Domain Adaptation Relation vectors : Relations as r 24 embeddings in the feature space h 4 r 43 = tanh( W h h 3 + W x x 4 ) h 4 = tanh( W amod r 43 + W x x 4 + b ) r 2� r 43 h � h 3 x � x 2 x 3 x 4 they o�er good appetizers amod nsubj dobj root Figure 5: A recursive neural network. 9 / 19
Model Architecture: Recursive Neural Network h 2 y � Domain Adaptation 24 Relation vectors : Relations as r 24 embeddings in the feature space y � y � h 4 2� 43 r 43 = tanh( W h h 3 + W x x 4 ) h 4 = tanh( W amod r 43 + W x x 4 + b ) r 2� r 43 h � h 3 x � x 2 x 3 x 4 Auxiliary task : Dependency relation prediction they o�er good appetizers amod nsubj y R ˆ 43 = softmax( W R r 43 + b R ) dobj root Figure 5: A recursive neural network. 9 / 19
Model Architecture: Learn Shared Representations R ecursive N eural S tructural C orrespondence N etwork (RNSCN) RNSCN x 2 x � x 2 x 3 x 4 x � x 3 x 4 x 5 x 6 The laptop has a nice screen they o�er good appetizers amod nsubj amod nsubj det dobj root det �ource Target dobj Figure 6: An example of how RNSCN learns the correspondences. 10 / 19
Model Architecture: Learn Shared Representations R ecursive N eural S tructural C orrespondence N etwork (RNSCN) RNSCN h 6 r 36 y � 64 y � h 4 43 y � 65 r 64 r 43 r 65 h 3 h 4 h 5 x � x 2 x 3 x � x 2 x 3 x 4 x 5 x 6 x 4 The laptop has a nice screen they o�er good appetizers amod nsubj amod nsubj det dobj root det �ource Target dobj Figure 6: An example of how RNSCN learns the correspondences. 10 / 19
Model Architecture: Learn Shared Representations R ecursive N eural S tructural C orrespondence N etwork (RNSCN) RNSCN h 3 h 2 y � y � y � 36 24 32 h 6 r 36 r 32 y � r 24 64 y � y � y � h 4 2� 2� 43 h 2 y � 65 r 64 r 2� r 2� r 43 r 65 h � h � h 3 h 4 h 5 x � x 2 x 3 x � x 2 x 3 x 4 x 5 x 6 x 4 The laptop has a nice screen they o�er good appetizers amod nsubj amod nsubj det dobj root det �ource Target dobj Figure 6: An example of how RNSCN learns the correspondences. 10 / 19
Model Architecture: Learn Shared Representations h � h � h � h � h � h � h � h � h � h � � 2 3 4 � 2 3 4 5 6 GRU RNSCN h 3 h 2 y � y � y � 24 36 32 h 6 r 36 r 32 y � r 24 64 y � y � y � h 4 2� 2� 43 h 2 y � 65 r 64 r 2� r 2� r 43 r 65 h � h � h 3 h 5 h 4 x � x 2 x 3 x 4 x 5 x � x 2 x 3 x 4 x 6 The laptop a they o�er good appetizers has nice screen amod amod nsubj nsubj det root dobj det �ource Target dobj Figure 6: An example of how RNSCN learns the correspondences. 10 / 19
Model Architecture: Auxiliary Label Denoising h 6 noisy l�bel correct l�bel h 4 y � y � 65 4� r 4� r 65 h � h 5 x � x 4 x 5 x 6 good �ppetizers nice screen amod dobj �ource Target Figure 7: An autoencoder for label denoising. 11 / 19
Model Architecture: Auxiliary Label Denoising Reduce label noise : Auto�encoder Auto�encoder auto-encoders intrinsic intrinsic group group h 6 h 4 Encoding: y � y � 4� 65 g nm = f enc ( W enc , r nm ) g 4� g 65 Decoding: r 4� r 65 h � h 5 r ′ nm = f dec ( W dec , g nm ) x � x 4 x 5 x 6 Auxiliary task: good �ppetizers nice screen amod dobj y R ˆ nm = softmax( W R g nm ) Source Target Figure 7: An autoencoder for label denoising. 11 / 19
Model Architecture: Auxiliary Label Denoising h n auto�encoder y � nm g � g 2 g j � j group y � auto�encoder nm em�edding r � g nm nm W enc W dec r nm r nm h m x n encode decode Figure 8: An autoencoder for relation grouping. exp( r ⊤ nm W enc g i ) p ( G nm = i | r nm ) = (1) � r nm − W dec g nm � 2 = ℓ R 1 � exp( r ⊤ nm W enc g j ) 2 j ∈ G K � − y R y R ℓ R 2 = nm [ k ] log ˆ | G | nm [ k ] � = p ( G nm = i | r nm ) g i (2) k =1 g nm 2 � G ⊤ ¯ � i =1 � I − ¯ = ℓ R 3 G � � = ℓ R 1 + αℓ R 2 + βℓ R 3 (3) � ℓ R F 12 / 19
Outline Introduction 1 Background Definition & Motivation Overview & Contribution Model Architecture 2 Experiments 3 Conclusion 4 13 / 19
Experiments Dataset Description # Sentences Training Testing R Restaurant 5,841 4,381 1,460 L Laptop 3,845 2,884 961 D Device 3,836 2,877 959 Table 1: Data statistics with number of sentences. Table 2: Comparisons with different baselines. 14 / 19
Experiments Injecting noise into syntactic relations R → L R → D L → R L → D D → R D → L Models AS OP AS OP AS OP AS OP AS OP AS OP RNSCN-GRU 37.77 62.35 33.02 57.54 53.18 71.44 35.65 60.02 49.62 69.42 45.92 63.85 RNSCN-GRU (r) 32.97 50.18 26.21 53.58 35.88 65.73 32.87 57.57 40.03 67.34 40.06 59.18 RNSCN + -GRU 40.43 65.85 35.10 60.17 52.91 72.51 40.42 61.15 48.36 73.75 51.14 71.18 RNSCN + -GRU (r) 39.27 59.41 33.42 57.24 45.79 69.96 38.21 59.12 45.36 72.84 50.45 68.05 Table 3: Effect of auto-encoders for auxiliary label denoising. Words grouping learned from auto-encoders Group 1 this, the, their, my, here, it, I, our, not Group 2 quality, jukebox, maitre-d, sauces, portions, volume, friend, noodles, calamari Group 3 in, slightly, often, overall, regularly, since, back, much, ago Group 4 handy, tastier, white, salty, right, vibrant, first, ok Group 5 get, went, impressed, had, try, said, recommended, call, love Group 6 is, are, feels, believes, seems, like, will, would Table 4: Case studies on word clustering 15 / 19
Experiments 0.64 0.64 f1-opinion 0.63 f1-opinion 0.63 0.62 0.62 0.61 0.61 0.60 0.60 0.59 0.59 0.58 0.58 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 5 10 15 20 25 30 35 40 0.42 0.42 0.41 0.41 f1-aspect f1-aspect 0.40 0.40 0.39 0.39 0.38 0.38 0.37 0.37 0.36 0.36 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 5 10 15 20 25 30 35 40 trade-off parameter ( γ ) number of groups ( | G | ) (a) trade-off (b) Groups Figure 9: Sensitivity studies for L → D. 16 / 19
Recommend
More recommend