CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu
Teams of 2 ‐ 3 students (1 is also ok) Teams of 2 3 students (1 is also ok) Project: Experimental evaluation of algorithms and models Experimental evaluation of algorithms and models on an interesting dataset A theoretical project that considers a model, an algorithm or a network property and derives a rigorous result about it An in depth critical survey of one of the course An in ‐ depth critical survey of one of the course topics relating models, experimental results and underlying social theories and offering a novel perspective on the area 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2
Answer the following questions: Answer the following questions: What is the problem you are solving? Wh t d t What data will you use (how will you get it)? ill (h ill t it)? How will you do the project? Which algorithms/techniques/models you plan to Whi h l ith /t h i / d l l t use/develop? Be as specific as you can! p y Who will you evaluate, measure success? What do you expect to submit/accomplish by What do you expect to submit/accomplish by the end of the quarter? 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3
The project should contain at least some amount of p j mathematical analysis, and some experimentation on real or synthetic data The result of the project will typically be a 10 page h l f h j ill i ll b 10 paper, describing the approach, the results, and the related work. Due on midnight OCT 18 2010 Upload PDF to http://coursework.stanford.edu Upload PDF to http://coursework.stanford.edu TAs will assign group numbers – we will send a link to a GoogleDoc g Name your file: <group#>_proposal.pdf 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4
Wikipedia Wikipedia IM buddy graph Yahoo Altavista web graph Yahoo Altavista web graph Stanford WebBase Twitter Data Twitter Data Blogs and news data Yahoo Music Ratings Yahoo Music Ratings 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5
Richly labeled network containing extracted Richly labeled network containing extracted data from Wikipedia (based on infoboxes): Richly labeled network Richly labeled network multiple types of nodes and edges About 2.6 million concepts described by 247 million triples, including abstracts in 14 different languages http://dbpedia org http://dbpedia.org Other OpenLinkedData datasets available at http://esw.w3.org/DataSetRDFDumps 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6
Networks of positive and negative edges Networks of positive and negative edges Data includes: Trust/distrust edges Trust/distrust edges Also Epinions product reviews and review ratings SNAP: http://snap stanford edu/data/#signnets SNAP: http://snap.stanford.edu/data/#signnets Trustlet: http://www.trustlet.org/wiki 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7
Prosper marketplace – Peer ‐ to ‐ peer lending: Prosper marketplace – Peer to peer lending: Lenders ask for loans People then bid (price, interest rate) on loans to P l th bid ( i i t t t ) l t fund them Rich social structure around the website Rich social structure around the website Data at http://www prosper com/tools/DataExport aspx Data at http://www.prosper.com/tools/DataExport.aspx 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8
Turiya is a start up that collects game data from game Turiya is a start up that collects game data from game publishers and processes these to produce business intelligence of value to it’s clients Data collected includes: Data collected includes: Players and their attributes Logs of game events g g Information about virtual items Information about transactions in real money or credits Analyses include: A l i l d Player segmentation Virtual goods recommendations Virtual goods recommendations If If you are interested i t t d Lifetime value estimation of players – send us an email! 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9
What to Wear is a Social Game played on Facebook Contestants create outfits and submit these to a daily C t t t t tfit d b it th t d il competition, which has a theme like e.g. “an outfit for attending your ex’s ‐ wedding” Contestants can also vote and comment on other people’s Contestants can also vote and comment on other people s submissions You get credit for both participating and judging Items for outfits are either bought from the store or reused from Items for outfits are either bought from the store or reused from the contestant’s closet ~30,000 players/month Data about this game includes: Player data Data about previous competitions Fashion items data If you are interested If i t t d Data about outfits – send us an email! Many other data (~400 relations in all) 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10
Amazon product review data: Amazon product review data: For each product: P Product info: name, salesrank d t i f l k Product categorization All reviews All i user, rating, how helpful was the review People who bought X also bought Y – network! P l h b ht X l b ht Y t k! If If you are interested i t t d – send us an email! 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11
Collaboration network of computer scientists Collaboration network of computer scientists Each CS publication is included: Author names Author names Title Year Year Conference, journal name Get the data at: http://dblp.uni ‐ trier.de/xml/ http://kdl.cs.umass.edu/data/dblp/dblp ‐ info.html p // / / p/ p 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12
Patents (http://www.nber.org/patents/) ( p // g/p /) Citations between patents For each patent we also know: Time Time Patent categorization Patent inventor data, … Arxiv High ‐ energy Physics: g e e gy ys cs Citation network between papers For each paper we also know Author names Author names Title and abstract of the paper Year of publication Journal Journal Data at: http://snap.stanford.edu/data/#citnets 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13
~50 million tweets per month starting 50 million tweets per month starting in June 2009 (6 months) Format: T 2009-06-07 02:07:42 2009 06 07 02 07 42 U http://twitter.com/redsoxtweets W #redsox Extra Bases: Sox win, 8-1: The Rangers spoiled Jon Lester's perfecto and his shutout.. http://tinyurl.com/pyhgwy http://tinyurl.com/pyhgwy Two important things: If you are interested URLs – send us an email! send us an email! H Hash ‐ tags h t Twitter social graph and some profiles: http://an kaist ac kr/traces/WWW2010 html http://an.kaist.ac.kr/traces/WWW2010.html 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14
Inferring links of the who ‐ follows ‐ whom Inferring links of the who follows whom network What is the lifecycle of URLs and hash ‐ tags? h h l f l f d h h ? How do hash ‐ tags get adopted? Multiple competing hash ‐ tags, which one wins? M l i l i h h hi h i ? Finding early/influential users? Community discovery Where/how will the information propagate? 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15
More than 1 million newsmedia and blog More than 1 million newsmedia and blog articles per day since August 2008 Extracted phrases (quotes) and links Extracted phrases (quotes) and links http://memetracker.org Format: Format: http://cnnpoliticalticker.wordpress.com/2008/08/31/mccain-defends- P palins-experience-level 2008-09-01 00:00:13 T dangerously unprepared to be president dangerously unprepared to be president Q Q even more dangerously unprepared Q understands the challenges that we face Q worked and succeeded Q http://www.cnn.com L 10/11/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16
Recommend
More recommend