Detecting Product Review Spammers using Rating Behaviors Itay - PowerPoint PPT Presentation

Detecting Product Review Spammers using Rating Behaviors Itay Dressler

• What is Spam? • Why should you care? • How to detect Spam? 2

What is Spam?

What is Spam? • All forms of malicious manipulation of user generated data so as to influence usage patterns of the data. • Examples of web spam include search engine spam (SEO), email spam, and Opinions spam (talk-backs).

Search Spam Keyword stuffing

Mail Spam (before mail spam detection)

Our Focus Spam found in online product review sites Review Spam Opinion Spam

• Review spam is designed to give unfair view of some products so as to influence the consumer’s perception of the products by directly or indirectly inflating or damaging the product’s reputation. • Under rating / Over rating. • Unfair treatment of products.

Review Spam Example

Review Spam Examples

Review Spamming is a Profession today

Why should you care? • Customers rely on reviews today more tan ever. • In general, every decision we make is heavily depended on reviews.

What is Amazon? • Largest Internet-based company in the United States. • Revenue totaled $61 billion in 2012. • 244 Million Users. • Books, Kindle and FirePhone.

What is Amazon? • Amazon's warehouses have more square footage than 700 Madison Square Gardens and could hold more water than 10,000 Olympic Pools.

Example • “Mr Unhappy” • Lack of seriousness (Identical reviews). • Very different from other reviews (96 in total).

Extremely Hard to Detect • Spam reviews usually look perfectly normal until one compares them with other reviews. • Tedious and Non-trivial task. • Amazon allows users to vote reviews (Spam-able).

Review Spammer Detection • Detecting Spammers vs Detecting spammed reviews (using Spammer classified behavior). • The amount of evidence is limited (One review & one Product). • Scaleable approach - incorporate new spamming behaviors as they emerge. • Each model assigns a numeric spamming behavior score to each reviewer .

Spamming Behavior Models • (TP) - Targeting Product. • (TG) - Targeting Group (Brand). • (GD) General Rating Deviation. • (ED) Early Rating Deviation. • Overall numeric spam score to each user (linear weighted combination). • Avoiding deep natural text understanding (High computational costs).

Related Work • Opinion and Sentiment Mining. Extracting and aggregating positive and negative opinions (Text Mining). • Do not address spam detection unless being used to derive other more • relevant features. • Item Spam Detection. • Singleton reviewers - users who contribute only one review each.. • Helpful Review Prediction (Votes). • Not all unhelpful reviews are spam.

Amazon Dataset • Product. • Brand. Attributes (Book product has ‘author, ‘publisher and ‘price). • One-to-Many reviews. • • User - can contribute one or multiple reviews. • Review. Textual comment. • Numeric rating (normalized to [0 1]. • Helpfulness anonymous votes from users. •

Preprocessing of Amazon’s Dataset (“MProducts”) • Removal of anonymous users (Each anonymous user id may be used by multiple persons). • Removing duplicated products (Some products have minor variations from others - e.g. color). One product is chosen randomly, and all reviews are attached to it. • • Removal of inactive users and unpopular products. The threshold is 3 reviews per product, and 3 reviews per user. • • Resolution of brand name synonyms - done manually (only few hundred brand names in DB). HP is the same brand as Hewlett Packard . •

Notations -

Target Based Spamming • A spammer will direct most of his efforts to promote or victimize a few products or product lines. • Targeted products. • Targeted product groups.

Targeting Products • Easily observed by the number of reviews (also ratings) on the product (As seen in the previous table). • In MProducts, 2874 reviewer-product pairs involve multiple reviews/ratings (small number comparing to #Reviews ~= 50k). • Most of these pairs, 1220 of them, involve only ratings of 5 compared with 624 of them involving only ratings of 1 or 2.

Targeting Products • Rating Spamming - Reviewers who are involved in reviewer-product pairs with larger number of ratings are likely to be spammers (Especially when the ratings are similar). • UserRatingSpam Score = • Based on the spam score function, reviewers with large proportions of ratings involved as multiple similar ratings on products are to be assigned high spam scores.

Targeting Products • Review Text Spamming - similar to rating spamming. • Such review texts are likely to be identical or look similar so as to conserve spamming efforts, but we need to distinguish them from genuine text reviews. • Similarity of text reviews - • • cosine(v k , v k ′ ) is the cosine similarity of the bi-gram TFIDF vectors of v k and v k ′ . (Frequency Inverse Document Frequency). • Spam Score -

Targeting Products • Combined Spam Score -

Targeting Product Groups • The pattern of spammers manipulating ratings of a set of products sharing some common attribute(s) within a short span of time. (saves the spammer from re-login). • Ratings given to these target group of products are either very high or low - so we will device them to 2 different scores: • Single Product Group Multiple High Ratings. • Single Product Group Multiple Low Ratings.

Targeting Product Groups Single Product Group Multiple High Ratings. • We divide the whole time period into small disjoint time windows of fixed-size and derive clusters of very high ratings. • The high rating cluster by user ui to a product group bk = • Only large groups are assumed to be spam (larger than min=3), and were saved in • “w” was empirically chosen to be one day interval. • The product attribute used for MProducts datasets is ‘brand.

Targeting Product Groups Single Product Group Multiple High Ratings. • The spam score a user u i based on single product group multiple high ratings behavior is thus defined by:  

Targeting Product Groups Single Product Group Multiple Low Ratings . • The motive here is to create a negative perception of the affected products so as to reduce their sales. • Mini-size here is 2, due the lower number of or ratings the database. • Spam score -

Targeting Product Groups . • Combined Spam Score =

DEVIATION-BASED SPAMMING • General Deviation • Early Deviation.

DEVIATION-BASED SPAMMING General Deviation • A reasonable rater is expected to give ratings similar to other raters of the same product. As spammers attempt to promote or demote products, their ratings could be quite different from other raters. • Deviation of a rating eij := difference from the average rating on the same product

DEVIATION-BASED SPAMMING Early Deviation • Early deviation captures the behavior of a spammer contributing a review spam soon after product is made available for review. • Other reviewers are highly influenced from these early reviews - which affects the products highly. • The early deviation model thus relies on two pieces of information: General Deviation. • Weight of each rating indicating how early the rating was given (alpha is a premature greater than • one to accelerate decay). • The final Deviation spam score is -

User Evaluation

User Evaluation • The objective is to evaluate the performance of different solution methods, which are based on the declared Spammer Scores: Single product multiple reviews behavior (TP). • Single product group multiple reviews behavior (TG). • General deviation (GD) behavior. • Early deviation (ED) behavior with α =1.5. • Newly introduced empirical combined method - (ALL) • Newly introduced Baseline method - ranks the reviewers by their unhelpfulness score. •

User Evaluation • The methods will be compared to the results of real human testers, but there are several challenges in conducting the user evaluation experiments : Too many reviewers. Each reviewer can have up to 349 reviews (in MProducts). • The existing amazon website is not designed for review spammer detection (designed for real users). • The need to train human evaluators. • • These issues were handled by using a smaller subset of the database (reviewers who were highly suspected as spammers by the previous methods, and random reviewers), developing a special software for human testers (review spammer evaluation software).

User Evaluation Review spammer evaluation software • Ensures that the human evaluators can easily browse the reviewer profiles and their reviews (both selected and non-selected). • The software makes the human evaluators go through all of the reviews of the reviewer before determining their judgement about him (10 reviews max per reviewers in this experiment). • Reduces the amount of evaluation efforts and time. • Features: • Easy visualization of reviews with exact and near-duplicates. • Review ratings among recent ratings on the same products. • Multiple reviews on same products. • Multiple reviews on the same product groups.

User Evaluation Review spammer evaluation software

Detecting Product Review Spammers using Rating Behaviors Itay - PowerPoint PPT Presentation

Detecting Product Review Spammers using Rating Behaviors Itay Dressler What is Spam? Why should you care? How to detect Spam? 2 What is Spam? What is Spam? All forms of malicious manipulation of user generated data so as to

Detecting Spammers and Content Detecting Spammers and Content Detecting Spammers and Content

Rating Factor 1 Review Rating Factor 1 Capacity of the Applicant 1 Rating Factor Review 2

Detecting Singleton Review Spammers Using Semantic Similarity Vlad Sandulescu, joint work with

Product Section Product Section New Product Introduction New Product Introduction Product

12/6/2013 Detecting Fakes Image Forensics: Detecting Forged Photos 1.Detecting photorealistic

Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Shuang

Review Rubric - Presentation Category Does not meet Poor Fair Satisfactory Excellent

20-03-06 7. Learning Sequences/Behaviors How to use sequences/behaviors? Sequences and more

NetFlow Analysis: Detecting covert channels on the network Detecting malicious traffic by using

E-mail trends in 2010: How do spammers get your address? Using distributed poisoned addresses to

Facility Construction Update Responsibilities, Tasks, and Timelines Bond Rating Bond rating

Carroll ISD Financial Accountability Rating Presentation Public Meeting December 3, 2018

Model Year and Vehicle Rating LeRoy Boison, FCAS, MAAA CAS 2010 RPM Seminar Chicago, IL

2020 Municipal Budget Borough of New Providence May 26, 2020 AAA Rating Our Guiding Principle

GCR GCR Global Credit Rating Co. Limited Global Credit Rating Co. Limited GCR

(IHBG) Competitive NOFA Training Rating Factor 3: Soundness of Approach 1 Rating Factor 3

for Microsoft Office 365 Agenda Product introduction Features and benefits How it works

Webs of Trust in Distributed Environments Bringing Trust to Email Communication BSc.

Exploiting Gnther Bayler, Christopher Kruegel, Redundancy in & Engin Kirda Natural

Q3 2018 Earnings Report Non-GAAP Financial Measures In addition to U.S. GAAP financials, this

Auto-learning of SMTP TCP Transport-Layer Features for Spam and Abusive Message Detection

Economics of Abuse Operations: Application to Hosting Matthew C. Stith September 28, 2016 San

Guidance for Macros in PowerPoints We use macros within PowerPoints to increase the interactivity

Disclaimer This presentation has been prepared by Commission staff to provide general information

Detecting Product Review Spammers using Rating Behaviors Itay - PowerPoint PPT Presentation

Detecting Product Review Spammers using Rating Behaviors Itay Dressler What is Spam? Why should you care? How to detect Spam? 2 What is Spam? What is Spam? All forms of malicious manipulation of user generated data so as to

Detecting Spammers and Content Detecting Spammers and Content Detecting Spammers and Content

Rating Factor 1 Review Rating Factor 1 Capacity of the Applicant 1 Rating Factor Review 2

Detecting Singleton Review Spammers Using Semantic Similarity Vlad Sandulescu, joint work with

Product Section Product Section New Product Introduction New Product Introduction Product

12/6/2013 Detecting Fakes Image Forensics: Detecting Forged Photos 1.Detecting photorealistic

Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Shuang

Review Rubric - Presentation Category Does not meet Poor Fair Satisfactory Excellent

20-03-06 7. Learning Sequences/Behaviors How to use sequences/behaviors? Sequences and more

NetFlow Analysis: Detecting covert channels on the network Detecting malicious traffic by using

E-mail trends in 2010: How do spammers get your address? Using distributed poisoned addresses to

Facility Construction Update Responsibilities, Tasks, and Timelines Bond Rating Bond rating

Carroll ISD Financial Accountability Rating Presentation Public Meeting December 3, 2018

Model Year and Vehicle Rating LeRoy Boison, FCAS, MAAA CAS 2010 RPM Seminar Chicago, IL

2020 Municipal Budget Borough of New Providence May 26, 2020 AAA Rating Our Guiding Principle

GCR GCR Global Credit Rating Co. Limited Global Credit Rating Co. Limited GCR

(IHBG) Competitive NOFA Training Rating Factor 3: Soundness of Approach 1 Rating Factor 3

for Microsoft Office 365 Agenda Product introduction Features and benefits How it works

Webs of Trust in Distributed Environments Bringing Trust to Email Communication BSc.

Exploiting Gnther Bayler, Christopher Kruegel, Redundancy in &amp; Engin Kirda Natural

Q3 2018 Earnings Report Non-GAAP Financial Measures In addition to U.S. GAAP financials, this

Auto-learning of SMTP TCP Transport-Layer Features for Spam and Abusive Message Detection

Economics of Abuse Operations: Application to Hosting Matthew C. Stith September 28, 2016 San

Guidance for Macros in PowerPoints We use macros within PowerPoints to increase the interactivity

Disclaimer This presentation has been prepared by Commission staff to provide general information

Exploiting Gnther Bayler, Christopher Kruegel, Redundancy in & Engin Kirda Natural