Distinguishing the Popularity Between Topics: A System for - PowerPoint PPT Presentation

Motivation The proposed system Experimental study (Qualitative) Conclusions Distinguishing the Popularity Between Topics: A System for Up-to-date Opinion Retrieval and Mining in the Web Nikolaos Pappas, Georgios Katsimpras, Efstathios Stamatatos March 26, 2013

Motivation The proposed system Experimental study (Qualitative) Conclusions Outline 1 Motivation 2 The proposed system Topic-related document discovery Opinion retrieval and mining 3 Experimental study (Qualitative) Distinguishing topic popularity Ranking of topics 4 Conclusions

Motivation The proposed system Experimental study (Qualitative) Conclusions Motivation Huge number of user-generated text in the Web Many applications based on sentiment analysis e.g. brand analysis, marketing effectiveness Most approaches focus on fixed collections or certain domains e.g. Twitter, Facebook, Blogspot Opinion analysis can differ according to the examined web genres e.g. articles, blogs, forums Challenges → Collecting domain-independent opinionated texts dynamically from the Web → Providing genre-based analysis of opinions → Comparing popularity of topics

Motivation The proposed system Experimental study (Qualitative) Conclusions The proposed system Synthesis of IR and NLP components: Discovery of topic-related documents dynamically from the Web 1 Detection of user-generated content regions inside the related pages 2 Identification of topic-related pages with confidence score 3 Subjectivity and polarity detection on the detected regions 4 Contributions → Up-to-date opinionated text retrieval and mining → Genre-based analysis of sentiment → Efficient estimation of total sentiment Evaluation: Qualitative analysis with real-world experiments

Motivation The proposed system Experimental study (Qualitative) Conclusions Synthesis of IR and NLP components

Motivation The proposed system Experimental study (Qualitative) Conclusions Topic-related document discovery Given a topic query (keyword form): → collection of seed URLs from major search engines → topic-(T) and genre-related (G) focused crawling [PKS12a] → scoring of unvisited pages using link analysis Linkscore ( p ) = w T ∗ Linkscore T ( p ) + w G ∗ Linkscore G ( p ) → targeting to web genres (news, blogs, discussions) highly likely to contain opinions

Motivation The proposed system Experimental study (Qualitative) Conclusions Opinion retrieval and mining Page segmentation and filtering 1 Sentiment analysis 2

Motivation The proposed system Experimental study (Qualitative) Conclusions Page segmentation and filtering Given a web page: → segmentation into coherent parts and noise removal (e.g. ads) [PKS12b] → rule-based classification and region extraction of three classes → confidence of page relevance based on the topic presence (keyword(s)) in the detected regions (weighted linear combination)

Motivation The proposed system Experimental study (Qualitative) Conclusions Sentiment analysis For each sentence in the detected regions: → subjectivity classification (bootstrap with Pattern-based learner) [RW03] → polarity classification (bootstrap with SVM) [WWH05, MW09] → total sentiment estimation � � � � TotalScore ( D ) = Score ( r ij ) ∈ R (1) d j ∈ D r ij ∈ d j � � � Score ( r ij ) � NormalizedScore ( D ) = ∈ R (2) | r ij | d j ∈ D r ij ∈ d j | r pos | SentimentRatio ( D ) = | r pos | + | r neg | ∈ [0 , 1] (3)

Motivation The proposed system Experimental study (Qualitative) Conclusions Case study 1: Distinguishing topic popularity

Motivation The proposed system Experimental study (Qualitative) Conclusions Case study 1: Opinions per detected regions

Motivation The proposed system Experimental study (Qualitative) Conclusions Case study 2: Ranking of topics Rank Soft drink Likes Talking Both TotalScore NormalizedScore 1 st Dr. Pepper 12,093,912 187,011 12,280,923 7up Dr. Pepper 2 nd Pepsi 11,835,244 236,105 12,071,349 Dr. Pepper Pepsi 3 rd Sprite 8,574,563 50,192 8,624,755 Sprite Fanta 4 th Fanta 2,650,072 84,080 2,734,152 Fanta 7up 5 th 7up 785,967 75,996 861,963 Pepsi Sprite IM Client Followers - - TotalScore NormalizedScore 1 st Google Talk 405,818 - - Google Talk Google Talk 2 nd Skype 367,385 - - Skype Skype 3 rd MSN 82,896 - - MSN MSN 4 th AOL 14,431 - - AOL ICQ 5 th ICQ 14,138 - - ICQ AOL NDCG: 0.841 0.993

Motivation The proposed system Experimental study (Qualitative) Conclusions Conclusions up-to-date discovery of opinionated text for given topics genre-aware sentiment analysis of opinions real-world studies distinguishing the popularity between topics comparative results for several topics efficient popularity estimation with few hundred pages potential application to other text analysis tasks Implemented components available online: https://github.com/nik0spapp/icrawler

Motivation The proposed system Experimental study (Qualitative) Conclusions End of Presentation Thank you! Any questions?

Motivation The proposed system Experimental study (Qualitative) Conclusions References Dietrich Klakow M Wiegand, Bootstrapping supervised machine-learning polarity classifiers with rule-based classification , Proc. of the ECAI-Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA), 2009. Nikolaos Pappas, Georgios Katsimpras, and Efstathios Stamatatos, An agent-based focused crawling framework for topic- and genre-related Web document discovery , 24th IEEE Int. Conf. on Tools with Artificial Intelligence (Athens, Greece), 2012. Nikolaos Pappas, Georgios Katsimpras, and Efstathios Stamatatos, Extracting informative textual parts from Web pages containing user-generated content , 12th Int. Conf. on Knowledge Management and Knowledge Technologies (Graz, Austria), 2012. Ellen Riloff and Janyce Wiebe, Learning extraction patterns for subjective expressions , Proc. of the 2003 Conf. on Empirical methods in natural language processing, EMNLP ’03, 2003, pp. 105–112. Theresa Wilson, Janyce Wiebe, and Paul Hoffmann, Recognizing contextual polarity in phrase-level sentiment analysis , Proc. of the Conf. on Human Language Technology and Empirical Methods in Natural

Distinguishing the Popularity Between Topics: A System for - PowerPoint PPT Presentation

Motivation The proposed system Experimental study (Qualitative) Conclusions Distinguishing the Popularity Between Topics: A System for Up-to-date Opinion Retrieval and Mining in the Web Nikolaos Pappas, Georgios Katsimpras, Efstathios

How kids perceive popularity? Sha Lana Clinton Outline u Popularity u Experiment u

Stability, Popularity, and Lower Quotas Meghana Nasre IIT Madras CAALM 2019 Chennai

Indistinguishability Theory Ueli Maurer ETH Zurich FOSAD 2009, Bertinoro, Sept. 2009.

A STUDY OF REPOSITORY NETWORK Distribution of popularity & Effect of coexisting languages

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Instead of generic

Scripting languages: Perl and PHP Tuukka Haapasalo December 1, 2009 Tuukka Haapasalo Scripting

Does Content Determine Information Popularity in Social Media? A Case Study of YouTube Videos

DISTINGUISHING BETWEEN TYPICALLY DEVELOPING ENGLISH LEARNERS AND THOSE WITH READING AND LEARNING

Classifying Organisms Learning Objective: To explore ways of distinguishing between organisms

Academic Integrity: Distinguishing Between Collaboration and Collusion Dr Donna M Velliaris

Distinguishing between SUSY and Littlest Higgs Model using trileptons at the LHC (Pheno09,

The rise and decline of an open collaboration system: How Wikipedia's reaction to popularity is

Evil Kermit, its popularity and its cultural significance Definition of a meme(Andrew M)

Distinguishing Performance Using Patient-Reported Outcome Measures Adam Rose, MD MSc RAND

Distinguishing Forced and Internal Multi-Decadal Variability in the North Atlantic and their

Distinguishing Multiplications from Squaring Operations Frederic Amiel Benoit Feix Michael

Motivations ( key word ) Decomposition of the solving parts : SS : Search Strategy

SEO TOOLS KEEPING A CHECK ON THE LATEST & GREATEST SEO TOOLS IN THE MARKET Introduction

A Static Verification Framework for Message Passing in Go using Behavioural Types Julien Lange 1 ,

Multiple Instance Detection Network with Online Instance Classifier Refinement Peng Tang

Course Content Principles of Knowledge Introduction to Data Mining Discovery in Databases

Landscaping Performance Research at the ICPE and its Predecessors: A Systematic Literature Review

Graph-Based Word Embeddings Learning Presenter: Zheng ZHANG Supervisors: Pierre

CS 1655 / Spring 2013 Secure Data Management and Web Applications 04 Information

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Distinguishing the Popularity Between Topics: A System for - PowerPoint PPT Presentation

Motivation The proposed system Experimental study (Qualitative) Conclusions Distinguishing the Popularity Between Topics: A System for Up-to-date Opinion Retrieval and Mining in the Web Nikolaos Pappas, Georgios Katsimpras, Efstathios

How kids perceive popularity? Sha Lana Clinton Outline u Popularity u Experiment u

Stability, Popularity, and Lower Quotas Meghana Nasre IIT Madras CAALM 2019 Chennai

Indistinguishability Theory Ueli Maurer ETH Zurich FOSAD 2009, Bertinoro, Sept. 2009.

A STUDY OF REPOSITORY NETWORK Distribution of popularity &amp; Effect of coexisting languages

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Instead of generic

Scripting languages: Perl and PHP Tuukka Haapasalo December 1, 2009 Tuukka Haapasalo Scripting

Does Content Determine Information Popularity in Social Media? A Case Study of YouTube Videos

DISTINGUISHING BETWEEN TYPICALLY DEVELOPING ENGLISH LEARNERS AND THOSE WITH READING AND LEARNING

Classifying Organisms Learning Objective: To explore ways of distinguishing between organisms

Academic Integrity: Distinguishing Between Collaboration and Collusion Dr Donna M Velliaris

Distinguishing between SUSY and Littlest Higgs Model using trileptons at the LHC (Pheno09,

The rise and decline of an open collaboration system: How Wikipedia's reaction to popularity is

Evil Kermit, its popularity and its cultural significance Definition of a meme(Andrew M)

Distinguishing Performance Using Patient-Reported Outcome Measures Adam Rose, MD MSc RAND

Distinguishing Forced and Internal Multi-Decadal Variability in the North Atlantic and their

Distinguishing Multiplications from Squaring Operations Frederic Amiel Benoit Feix Michael

Motivations ( key word ) Decomposition of the solving parts : SS : Search Strategy

SEO TOOLS KEEPING A CHECK ON THE LATEST &amp; GREATEST SEO TOOLS IN THE MARKET Introduction

A Static Verification Framework for Message Passing in Go using Behavioural Types Julien Lange 1 ,

Multiple Instance Detection Network with Online Instance Classifier Refinement Peng Tang

Course Content Principles of Knowledge Introduction to Data Mining Discovery in Databases

Landscaping Performance Research at the ICPE and its Predecessors: A Systematic Literature Review

Graph-Based Word Embeddings Learning Presenter: Zheng ZHANG Supervisors: Pierre

CS 1655 / Spring 2013 Secure Data Management and Web Applications 04 Information

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

A STUDY OF REPOSITORY NETWORK Distribution of popularity & Effect of coexisting languages

SEO TOOLS KEEPING A CHECK ON THE LATEST & GREATEST SEO TOOLS IN THE MARKET Introduction