semisupervised autoencoder for sentiment analysis
play

Semisupervised Autoencoder for Sentiment Analysis Shuangfei Zhai, - PowerPoint PPT Presentation

Semisupervised Autoencoder for Sentiment Analysis Shuangfei Zhai, Zhongfei Zhang. Seoul National University ga0408@snu.ac.kr July 06, 2018 1/10 Traditional autoencoders suffer from at least two aspects. Scalability with the


  1. Semisupervised Autoencoder for Sentiment Analysis Shuangfei Zhai, Zhongfei Zhang. 이 종 진 Seoul National University ga0408@snu.ac.kr July 06, 2018 1/10

  2. ◮ Traditional autoencoders suffer from at least two aspects. – Scalability with the high dimensionality of vocabulary size. – Dealing with task-irrelevant words. ◮ Proposed are divised to learns highly discriminative feature maps. 2/10

  3. ◮ x: n-gram count data, y: label, ˜ x : reconstruction of x. ◮ Traditional autoencoder’s loss function. x − x ) 2 D (˜ x , x ) = (˜ (1) – Reconstruction to be accurate towards frequent words. ◮ Proposed autoencoder’s loss function. x , x ) = ( θ T (˜ x − x )) 2 D (˜ (2) – θ are the weights of the linear classfier for label. – Reconstruction to be accurate towards only along directions where the linear classifier is sensitive to. 3/10

  4. x − x )) 2 has rationalized from the perspective of Bregman ◮ D (˜ x , x ) = ( θ T (˜ Divergence ◮ SVM2 (max( 0 , 1 − y i θ T x i )) 2 + λ � θ � 2 � L ( θ ) = (3) ◮ θ is fixed. f ( x i ) = (max( 0 , 1 − y i θ T x i )) 2 (4) ◮ Reconstruct ˜ x i to have small value of f (˜ x i ) = f ( x i ) – we would like to ˜ x i to still be correctly classified by the pretrained linear classifier. – Bregman Divergence from f ( x i ) and use it as the loss function of the subsequent autoencoder training, the autoencoder should be guided to give rescontruction errors that do not confuse the classifer. 4/10

  5. ◮ Bregman Divergence with respect to f. x ) − ( f ( x ) + ∆ f ( x ) T (˜ D f (˜ x , x ) = f (˜ x − x )) . (5) ◮ f ( x i ) is a quadratic function of x i , The Hessian follows as  ( θ T (˜ x i − x i )) 2 if 1 − y i θ T x i >0   H ( x i ) = (6) 0 , otherwise   ◮ Bregman Divergence is simply ( x − ˜ x ) T H ( x − ˜ x ) in SVM2  ( θ T (˜ x i − x i )) 2 if 1 − y i θ T x i >0   D f (˜ x , x ) = (7) 0 , otherwise   5/10

  6. The Bayesian Marginallization ◮ Estimate θ using one single classfier can bring bias. ◮ Bayesian approach, Borrowing the idea of Energy Based Model exp ( − β L ( θ )) p ( θ ) = (8) � exp ( − β L ( θ )) , d θ ◮ Rewrite D (˜ ( θ T (˜ x − x )) 2 p ( θ ) d θ , and using sampling method, � x , x ) = MCMC. p ( θ ) = N (ˆ ◮ Approximate p ( θ ) by gaussian ˜ θ, Σ) , then x − x )) 2 + (Σ 1 1 x , x ) = (ˆ θ T (˜ 2 (˜ x − x )) T (Σ 2 (˜ D (˜ x − x )) (9) β ( diag ( � I ( 1 − y i θ T x i > 0 ) x 2 ◮ Σ = 1 i )) − 1 6/10

  7. Experiments ◮ Dataset (IMDB dataset / Amazon review data of five item.) ◮ Method – Bag of Words with uni-gram or bi-gram – Normalization: log( 1 + c i , j ) x i , j = (10) max j log( 1 + c i , j ) – DAE/ DAE with Finetuning / NN / Logistic with Dropout / Semisupervised Bregman Divergence Autoencoder / SBDAE with Finetuning 7/10

  8. Experiments ◮ Book – id1: lost credability,quickly!!:chalupa, id2 : 4423 – asin : 055380121X – product name/product type – helpful: 12 of 15 – rating: 2.0 – title/data/reviewer/reviewer location – reviewer text I admit, I haven’t finished this book. A friend recommended it to me as I have been having problems with insomnia. I was interested in reading a book about women’s health issues and this one sounded intriguing UNTIL she started in with her tarot cards, interest in astrology and angels. Granted, I am not a firm believer in just "the hard facts" but its really hard to believe anything this woman writes after it is clear that common sense isn’t alternative enough for her! 8/10

  9. Experiments 9/10

  10. Experiments 10/10

Recommend


More recommend