Social Networking Trends and Social Networking Trends and Social Networking Trends and Social Networking Trends and Dynamics Detection Dynamics Detection via a Cloud via a Cloud- -based Framework Design based Framework Design Athena A h V k li Vakali Maria Giatsoglou St f Stefanos Antaris A t i Department of Informatics Department of Informatics, Aristotle University, Thessaloniki, Greece {avakali mgiatsog santaris}@csd auth gr {avakali, mgiatsog, santaris}@csd.auth.gr 1 MSND@WWW 2012, April 16th, Lyon, France
Outline Outline Outline Outline � Trend detection from social media: ◦ Motivation & Challenges � Current approaches pp � Problem formulation � The Cloud4Trends 3 tier approach � The Cloud4Trends 3-tier approach � Current limitations and the Cloud as a solution � The Cloud4Trends Cloud-based architecture � Implementation details p � Future outlook MSND@WWW 2012, April 16th, Lyon, France 2
Social media for understanding the Social media for understanding the pulse of the public pulse of the public � Social media have emerged as a popular means of Social media have emerged as a popular means of communication and opinion sharing � In social media : ◦ Users’ discussions range a wide variety of topics ◦ Also, users in general express their opinions freely � Social media can be viewed as a reflection of S i l di b i d fl i f societal concerns exhibiting ‘bursts’ of content generation on the occurrence of events g ◦ popular topics / interests fluctuate with time � Challenging for both computer scientists and application developers to reach unbiased, meaningful conclusions about trending users’ opinion and interests opinion and interests MSND@WWW 2012, April 16th, Lyon, France 3
Trend detection from social media Trend detection from social media - Challenges Challenges � Massive content sizes and unpredictable � Massive content sizes and unpredictable content generation rates make analysis difficult ◦ scalable analysis is needed � Trending topics should be discovered when they are “fresh” ◦ an on-line analysis approach is demanded � Trends should be meaningful T d h ld b i f l ◦ need for contextual trends � Content is dispersed in multiple sources C i di d i l i l ◦ trend detection needs a combined approach MSND@WWW 2012, April 16th, Lyon, France 4
Trend detection from text Trend detection from text- -based based social media content social media content � Users daily generate multimedia content in U d il l i di i social media � Some approaches in detecting events from multimedia content e.g. images, typically g g , yp y limited to off-line detection � T � T ext-based content offers a more flexible ext-based content offers a more flexible promising ground for online trend detection ◦ most frequently user generated type of content; t f tl t d t f t t ◦ users’ opinion is more explicitly expressed in text. MSND@WWW 2012, April 16th, Lyon, France 5
Blogs and Blogs and Microblogs g Microblogs g - as as sources for trending topics detection sources for trending topics detection WHY Blogs? g � Blogs have been used for quite a long time and are still popular � Opposed to typical online information sources, blogs are primarily opinion-oriented, reflecting the author’s freely expressed point of view p , g y p p WHY Microblogs? � Microblogging apps are key actors in real-time information b broadcasting d ti � Content generation rates in microblogging apps are very high, with T witter reaching currently reaching 200 million posts per day � Recent studies indicated microblogging (via tweets) is a valuable source of � Recent studies indicated microblogging (via tweets) is a valuable source of latent information about dynamics involved in the public’s opinions/views ◦ E.g. for prediction of forthcoming films’ and stock prices’ revenue, real-time earthquakes,’ identification, analysis of users’ reaction towards political debates � Microblogging apps capture the momentum of a large public’s scale ◦ T witter only has more than 300 million registered users � Although posts are short, they usually contain links to external web pages, thus allowing content merging to enrich post’s context h ll i i i h ’ ◦ e.g. around 25% of all tweets by Sept. 2010 included at least 1 hyperlink 6 MSND@WWW 2012, April 16th, Lyon, France
Current approaches Current approaches pp pp Typical trend analysis approach: application of traditional statistical methods based on � total number of keyword occurrences in texts to identify temporal trends, and most large- scale efforts specialize on searches analysis (e.g. Google Hot Trends) Clustering has been used for trend detection in blogs, in an offline manner � Most relevant online approaches include: � ◦ BlogScope & BlogScope & T T witterMonitor : collects information from blogs & other online sources and witterMonitor : collects information from blogs & other online sources and performs spatiotemporal burst detection. Event discovery is based on a given term query based on : i) there is a burst when an event occurs, ii) events have a temporal & geographical scope; ◦ NewsStand : online news aggregator service monitoring feeds from online news sources and detecting their geographical focus by analyzing their content. Articles are grouped into news stories with an online text-based clustering technique, and each story's geographic context is determined by its members geographical focus; ◦ T ◦ T witterStand : focuses on news detection from tweets It manually selects Seeder users known witterStand : focuses on news detection from tweets. It manually selects Seeder users known to post news and tweets are clustered with an online method based on similarities between tweets’ and clusters’ TF-IDF feature vectors (accounting also for their temporal similarity). Applications scalability concerns are not adequately addressed in most research � approaches T weet content expansion’s usefulness for trending topic detection has not been studied yet p g p y � (no generic framework combining tweet & blog analysis for trend detection) 7 MSND@WWW 2012, April 16th, Lyon, France
Cloud4Trends Cloud4Trends Cloud4Trends Cloud4Trends � Cloud4Trends is a microblogging & blogging localized content collection and analysis framework for detecting currently popular topics of users’ interest � Cloud4Trends ◦ addresses the Web 2.0 large scale reality by adopting methods for handling efficiently fast evolving data in real time; handling efficiently fast evolving data in real time; ◦ supports the analysis of text data from different web sources which may be generated at various rates in a unified way; y g y ◦ proposes a methodology for unsupervised detection of local contextual trends, combining content from different web sources; ◦ captures the shaping and evolution of users’ interests given their broader geographical location and the type of data source; ◦ follows a Cloud based data processing methodology to support a ◦ follows a Cloud-based data processing methodology to support a streaming web data clustering scenario 8 MSND@WWW 2012, April 16th, Lyon, France
Cloud4Trends approach Cloud4Trends approach Cloud4Trends approach Cloud4Trends approach � Trend dynamics by using T witter and the Blogosphere as data sources. � Applies incremental text clustering for detecting & maintaining a set of dynamic clusters ◦ assumes that analysis at a “document” instead of a “term” level is more promising for providing trending topics that are meaningful to users � Clustering approach extends earlier work in T � Clustering approach extends earlier work in T witterStand witterStand ◦ we expand the original tweet content by additional information following referenced web sources; ◦ we consider active clusters as active topics of users’ interest and ranked we consider active clusters as active topics of users interest and ranked them based on their observed activity for indentifying the most popular (trending) at their peak; ◦ our analysis pertains to certain geographical areas from the data collection phase rather than examining the geographical scope of the collection phase, rather than examining the geographical scope of the resulting clusters as a post-analysis process; ◦ we have designed a parallel Cloud-based architecture to address scalability concerns and enable our application to analyze more content (e.g. pp y ( g pertaining to several cities). MSND@WWW 2012, April 16th, Lyon, France 9
Trend detection : Trend detection : Problem formulation Problem formulation Given a time-ordered stream of users’ posts P t , t = [ 1 ,… ) arriving in real-time (tweets) or at a given time granularity (blog posts), identify topics and associated posts that are identify topics and associated posts that are popular (“trending”) at any given time, and monitor their dynamics and evolution across time monitor their dynamics and evolution across time in terms of their popularity . 10 MSND@WWW 2012, April 16th, Lyon, France
Cloud4Trends 3 Cloud4Trends 3 tier design Cloud4Trends 3 Cloud4Trends 3-tier design tier design tier design 11 MSND@WWW 2012, April 16th, Lyon, France
Recommend
More recommend