C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface NLP Course Term Project Aspect Extraction and Opinion Mining of Product Reviews Supervised by: Submitted by: Ast.Prof. Pawan Goyal Karanam Sai Ravi Teja Mentored by: Chandini Baratam Mr. Abishek K.L.S. Koutilya Varma Lokesh Dokara D.Anudeep M.Akshay
Abstract: Our project aims at mining reviews in order to build a model of important product features, their evaluation by reviewers and their relative quality across products. Sub- tasks are: identify product features and opinions w.r.t them, determine opinion polarity and rank opinions based on their strength. Necessary data is scraped from Flipkart and Amazon.
Three Parts of the Project 01 Aspect Extraction SECTION 02 Sentiment Analysis SECTION 03 Evaluation of results SECTION
Approach: Building blocks of the solution Data Sets Data is extracted from a live website from the Preprocessing URL given by user. Sentence segmentation, tokenization Applying NLP lemmatization, A Rule based POS tagging Aspect Extraction Sentiment Analysis Model is used. Sentence level sentiment analysis is applied to couple aspects with their sentiment
Data Extraction and Cleaning : The Data of reviews about a product have to be lively extracted from the e- commerce websites. A scrapping program is written to extract the reviews of the product. These extracted reviews are stored in in lists to facilitate the next processes. Different Data Sets were used for testing and evaluation.
POS Tagging, Lemmatization and Dependency Tree The reviews are first sentence segmented Then the segmented sentences are POS tagged and their dependency tree is generated. From the dependency tree the words are lemmatized. Now the rules are applied on the lemmatized dependency words to extract aspect.
Rule based Aspect Extraction: A set of 11 rules are applied on the segmented sentences. Assumption that the sentences are grammatically correct. The advantage of the above approach is that it is based on the fact that English sentences follow standard structure Aspects can be extracted independent of the product category. Now the rules are applied on the lemmatized dependency words to extract aspect.
A Rule based approach Implementation I like the lens of the camera. nsubj(like-2, I-1) I-active token(h) root(ROOT-0, like-2) Like-t det(lens-4, the-3) Lens-(n)obj dobj(like-2, lens-4) relation with like case(camera-7, of-5) det(camera-7, the-6) nmod(lens-4, camera-7)
A Rule based approach Implementation I like the lens of the camera. If an active token h is in a subject noun relationship with a word t then: If t has any direct object relation with a token n and the POS of the token is Noun and n is not in Sentiwordnet, then n is extracted as an aspect. In (2), like is in direct object relation with lens so the aspect lens is extracted.
Input-Output • Input: This camera has lots of great and easy to access settings , takes great pictures , and is small enough to travel with comfortably. the advanced features and physical controls also make it a great starter camera for amateur photographers. Output: small enough to travel | settings | lots | pictures | camera | small | has | make | features | controls | make it a great starter camera |
Results of the Rule based Aspect Extraction: ASPECT ASPECT DATA SET RECALL PRECISION Selected Sentences from 45.8% 50% 300 corpus 300 review corpus 42% 48%
Rule based Sentiment Extraction (Approach1): A rule based opinion extraction approach is used to extract the sentiment words in the sentence. The rules are applied on each sentence and the sentiment words are extracted. These sentiment words are attached to aspects found above. Resulted in poor performance with less recall and accuracy.
Naïve Sentence level Sentiment Extraction (Approach2): Each sentence in a review is segmented and the sentiment is calculated using multiplication rule. The sentiment of each word in the sentence, if it is an adverb, adjective or a verb is calculated from “senti - word net” and multiplied. The sentiment of all other parts of speech words are taken as 1. Assumption: Each sentence has only one aspect.
Input-Output • Input: the photo quality is very good. not dslr good, but that is to be expected. i feel that the camera takes pretty usable pictures up to iso 400, but if you plan on making very large prints (above an 8x10) it may be better to stay below iso 200.i wish the camera had fewer megapixels and better iso performance, but no camera is perfect. i highly recommend this to anyone looking for a portable high quality point shoot camera. Output: photo_quality | good | quality | . . camera | it | . i | recommend | . Sentiment score: 0.875__0.875__0.112890625__0.25__
Results of the Sentiment Analysis part: SENTIMENT DATA SET APPROACH RECALL 300 review Approach 1 23.5% corpus 300 review Approach 2 56.1% corpus
Graphical User Interface: Graphical User Interface is developed using Django Framework Takes URL of the product as the input. Gives Top ten aspect words based on frequency of occurrence along with their sentiment.
Contribution of Team Members: NAME WORK CONTRIBUTION Web Scrapping code, Integration and Testing, K. Sai Ravi Teja Sentiment Analysis (Approach 2) Rules implementation, Aspect-Category B. Chandini lexicon, Sentiment Analysis (Approach 2) Additional rules implementation and java code M. Akshay for Stanford parser, Sentiment Analysis (Improvement of Approach1) Java Code and Code connecting Java and Koutilya Varma python, Sentiment Analysis (Approach 1). Lokesh Dokara Web Application Development GUI Development, Rule Modification, D. Anudeep Sentiment Analysis rules (Approach 1).
References: Aspect Extraction: A rule based approach to aspect extraction from product reviews. Soujanya Poria, Erik Cambria, Lun-Wei Ku, Chen Gui, Alexander Gelbukh.
Recommend
More recommend