Discovering Weblog Communities A Content- and Topology-Based - PDF document

Discovering Weblog Communities A Content- and Topology-Based Approach Jeroen Bulters Maarten de Rijke ISLA, University of Amsterdam ISLA, University of Amsterdam Kruislaan 403, 1098 SJ Amsterdam Kruislaan 403, 1098 SJ Amsterdam The Netherlands The Netherlands jbulters@science.uva.nl mdr@science.uva.nl Abstract We believe that our work is of interest to two types of end users: (1) the algorithm we propose lays the ground work for Weblogs have become a leading form of self-publication on a tool that can used by individual bloggers as an exploratory the web. Personal weblogs are often considered to represent search tool, and (2) our algorithm can be extended to a tool a person, and the links between webogs can naturally be given for advertisers and marketeers, for whom a global view of a social interaction. Against this background, finding a com- likes, dislikes, and interests of groups of bloggers matters. munity around a given weblog—i.e., identifying a set of we- The remainder of this paper is organized as follows. We blogs that forms a natural group together with the starting start with a brief description of related work in Section 2. point, because of content or social reasons—is a very natural Then, in Section 3, we present our algorithm for discover- task. Traditional methods for community finding methods fo- ing weblog communities. We follow with a description of an cus almost exclusively on topology analysis. In this paper we experimental evaluation of the algorithm in Section 4. We present a novel method for discovering weblog communities report on the results in Section 5 and conclude in Section 6. that incorporates both topology analysis and content analysis. We evaluate our method in a small-scale user study, analyze the contributions of the various components of our 2. Related work approach, and compare it against a state-of-the-art topology- based community finding algorithm. The fact that a weblog is a web-based publication gives us the opportunity to apply traditional web-mining techniques to weblogs. A lot of work has been done on the identifica- 1. Introduction tion of clustered websites; see e.g., [2]. Although weblogs are In recent years weblogs have become a dominant form of self just websites, weblogs are often considered to “represent” a publication on the internet. The number of weblogs tracked person while a website represents a subject [5]. Websites can by Technorati has been doubling every 5 months and it is be characterized in terms of the strong distinction between often claimed that a new weblog is created every second. The authority-type and hub-type pages [4]; authority-type pages vast and evolving nature of the blogosphere offers interesting are considered to have substantially more outgoing links than challenges from the point of view of information access . incoming links while hub-type pages have a—more-or-less— In this paper, we focus on the following access task: given equal number of incoming and outgoing links. The analogy a weblog (or blogger), return a set of other weblogs that between authorities and subjects, and hubs and people is eas- form a community together with the starting blog. Tradi- ily made. While websites can be related to two types of pages, tional community extraction methods rely almost exclusively weblogs are considered to “identify” a person — who can have on an analysis of link topology around a given starting point, many different interests (subjects) — and can thus only be thereby effectively ignoring the immense amount of informa- related in an intuitive way with the hub-type pages of Klein- tion given by the weblogger in his posts. For example, in the berg’s HITS algorithm. Kumar et al. [5] present a topology- experimental evaluation in this paper one of the weblogs— based algorithm for community extraction which they later appelejan —was assessed as having 18 members in its com- use in so called Burst-Analysis. This algorithm is our base- munity; however, a state-of-the-art topology based algorithm line. yielded only three members of the community due to the fact Lin et al. [7] focus on extracting communities based on two that members in the community did not always link back to key insights: (a) communities form due to individual blog- each other or to other members of the community. ger actions that are mutually observable; (b) the semantics We present a novel community finding method that incor- of the hyperlink structure are different from traditional web porates both topology- and content-analysis. In addition to analysis problems. Their topology-based approach involves a detailed description of the core algorithm, we provide the developing computational models for mutual awareness that outcomes of a small-scale user study aimed at understand- incorporate the specific action type, frequency and time of ing the algorithm’s effectiveness and at comparing it with an occurrence. existing state-of-the-art solution. Merelo-Guervos et al. [8] map a weblog hosting site using Kohonen’s self-organizing map and discover interesting community features; they provide a comparison between their methods and other community-discovering algorithms. Like us, they use a mixture of topology- and content-analysis. ICWSM 2007 Boulder, CO USA

Discovering Weblog Communities A Content- and Topology-Based - PDF document

Discovering Weblog Communities A Content- and Topology-Based Approach Jeroen Bulters Maarten de Rijke ISLA, University of Amsterdam ISLA, University of Amsterdam Kruislaan 403, 1098 SJ Amsterdam Kruislaan 403, 1098 SJ Amsterdam The

Discovering Gods Word (Part-2) Discovering Gods Word (Part-2) Hermeneutics = The science

Spinn3r architecture and data Kevin Burton, Founder/CEO What is Spinn3r? Licensed weblog,

The 3 rd Covenant Re-Discovering the Word of God within the words of the Bible Re-Discovering The

~ Discovering gold in the Cortez gold-trend of Nevada ~ NUG:V NULGF:QX Discovering gold in

Discovering Mammalian Endocytic Discovering Mammalian Endocytic Pathways with High- -Throughput

DISCOVERING OF CHILDREN NEEDS DISCOVERING OF CHILDREN NEEDS AND POTENTIALS: MAP SUPPORT IN

Discovering Flight Chapter Overview Discovering Flight The Early Days of Flight Chapter

Discovering Gods Word (Part-1) Discovering Gods Word The Inspired Word (Part-1) 2

Dedicated to discovering new cannabis cultivars for disease-specific actions INVESTOR

One Belt, One Road Discovering the New Silk Road Chengdu and Shanghai June 2016 ONE BELT ONE

Generating Hypotheses by Generating Hypotheses by Discovering Implicit Associations in

Formal Modeling in Cognitive Science What are Collocations? Lecture 27: Application of Mutual

Vizor A platform for creating, publishing and discovering VR content on the Web. Jaakko

Discovering Relational Specifications by Calvin Smith, Gabriel Ferns, Aws Albarghouthi Muqsit

MeetupNet Dublin: Discovering Communities in Dublins Meetup Network Arjun Pakrashi,

Health Engagement with BME Communities in Brighton & Hove The Trust for Developing

Reciprocal Diagrams, Graphic Statics, Airy Stress Functions and Polyhedra Allan McRobie ,

FEO Initiation Devices P U T T I N G P U B L I C S A F E T Y F I R S T Dean McLellan

COLORADO GENERAL ELECTION VOTER OPINION SURVEY July 15 th 17 th , 2019 2 Colorado General

PRODUCT RECALLS Natasha Catchpole Practice Leader (Crisis Management & Product Recall) CFC

Presenting a live 90-minute webinar with interactive Q&A Reciprocal Easement Agreements:

Signals, Similarity and Seeds: Social Learning in the Presence of Imperfect Information and

Welcoming Environments: Is Your School Family-Friendly? A Presentation Introducing Georgias

Altruism, Insurance, and Costly Solidarity Commitments Vesall Nourani (MIT), Chris Barrett

Sambuz

Useful Links

Newsletter

Mail Us

Discovering Weblog Communities A Content- and Topology-Based - PDF document

Discovering Weblog Communities A Content- and Topology-Based Approach Jeroen Bulters Maarten de Rijke ISLA, University of Amsterdam ISLA, University of Amsterdam Kruislaan 403, 1098 SJ Amsterdam Kruislaan 403, 1098 SJ Amsterdam The

Discovering Gods Word (Part-2) Discovering Gods Word (Part-2) Hermeneutics = The science

Spinn3r architecture and data Kevin Burton, Founder/CEO What is Spinn3r? Licensed weblog,

The 3 rd Covenant Re-Discovering the Word of God within the words of the Bible Re-Discovering The

~ Discovering gold in the Cortez gold-trend of Nevada ~ NUG:V NULGF:QX Discovering gold in

Discovering Mammalian Endocytic Discovering Mammalian Endocytic Pathways with High- -Throughput

DISCOVERING OF CHILDREN NEEDS DISCOVERING OF CHILDREN NEEDS AND POTENTIALS: MAP SUPPORT IN

Discovering Flight Chapter Overview Discovering Flight The Early Days of Flight Chapter

Discovering Gods Word (Part-1) Discovering Gods Word The Inspired Word (Part-1) 2

Dedicated to discovering new cannabis cultivars for disease-specific actions INVESTOR

One Belt, One Road Discovering the New Silk Road Chengdu and Shanghai June 2016 ONE BELT ONE

Generating Hypotheses by Generating Hypotheses by Discovering Implicit Associations in

Formal Modeling in Cognitive Science What are Collocations? Lecture 27: Application of Mutual

Vizor A platform for creating, publishing and discovering VR content on the Web. Jaakko

Discovering Relational Specifications by Calvin Smith, Gabriel Ferns, Aws Albarghouthi Muqsit

MeetupNet Dublin: Discovering Communities in Dublins Meetup Network Arjun Pakrashi,

Health Engagement with BME Communities in Brighton &amp; Hove The Trust for Developing

Reciprocal Diagrams, Graphic Statics, Airy Stress Functions and Polyhedra Allan McRobie ,

FEO Initiation Devices P U T T I N G P U B L I C S A F E T Y F I R S T Dean McLellan

COLORADO GENERAL ELECTION VOTER OPINION SURVEY July 15 th 17 th , 2019 2 Colorado General

PRODUCT RECALLS Natasha Catchpole Practice Leader (Crisis Management &amp; Product Recall) CFC

Presenting a live 90-minute webinar with interactive Q&amp;A Reciprocal Easement Agreements:

Signals, Similarity and Seeds: Social Learning in the Presence of Imperfect Information and

Welcoming Environments: Is Your School Family-Friendly? A Presentation Introducing Georgias

Altruism, Insurance, and Costly Solidarity Commitments Vesall Nourani (MIT), Chris Barrett

Sambuz

Useful Links

Newsletter

Mail Us

Health Engagement with BME Communities in Brighton & Hove The Trust for Developing

PRODUCT RECALLS Natasha Catchpole Practice Leader (Crisis Management & Product Recall) CFC

Presenting a live 90-minute webinar with interactive Q&A Reciprocal Easement Agreements: