Dyn ynam amic ic In Info form rmat atio ion n Env nvir ironm onmen ents ts Susan Dumais Microsoft Research http://research.microsoft.com/~sdumais In collaboration with: Jaime Teevan, Eytan Adar, Jon Elsas, Dan Liebling, Richard Hughes UW, CSE 454, Dec 8 2009
Ou Outl tline ine Web search and context Temporal dynamics of information Characterizing change Content changes over time People re-visit and re-find Relationships between content change and re-access Improving retrieval and understanding Building support for understanding change (e.g., DiffIE) Leveraging dynamics for improved retrieval UW, CSE 454, Dec 8 2009
Web We b Sea earch rch at at 15 15 How it’s accessed What’s available Number of pages indexed 7/94 Lycos – 54,000 pages 95 – 10^6 millions 97 – 10^7 98 – 10^8 01 – 10^9 billions 05 – 10^10 … Types of content Web pages, newsgroups Images, videos, maps News, blogs, spaces Shopping, local, desktop Books, papers, many formats Health, finance, travel … UW, CSE 454, Dec 8 2009
Sup upport port fo for r Sea earc rchers hers The search box Spelling suggestions Query suggestions Auto complete Inline answers Richer snippets But, we can do better … by understanding context UW, CSE 454, Dec 8 2009
Search and Context Search Today User Context Query Words Query Words Ranked List Ranked List Document Context Task/Use Context UW, CSE 454, Dec 8 2009
Inter-Relationships among Documents Categorization and Metadata Reuters, spam, landmarks, web categories … Systems/Prototypes Domain-specific features, time • New capabilities and experiences Interfaces and Interaction • Algorithms and prototypes Stuff I’ve Seen, Phlat, Timelines, SWISH • Deploy, evaluate and iterate Tight coupling of browsing and search Redundancy Temporal Dynamics Modeling Users Short vs. long term Individual vs. group Implicit vs. explicit Evaluation Using User Models • Many methods, scales • Individual components Stuff I’ve Seen (re -finding) Personalized Search and their combinations News Junkie (novelty) User Behavior in Ranking Domain Expertise at Web-scale UW, CSE 454, Dec 8 2009
Information In formation Dy Dynamics namics Cont ntent ent Changes nges 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 User er Vis isita itatio ion/ n/Re ReVisit Visitat ation ion Today’s Browse and Search Experiences But, ignores … UW, CSE 454, Dec 8 2009
Di Digi gital tal Dy Dyna nami mics cs Ea Easy to Cap o Captur ure Easy to capture Few tools support dynamics UW, CSE 454, Dec 8 2009
Inf nformation ormation Dy Dynam namics ics Characterizing change Content changes over time People re-visit and re-find Relationships between content change and re-access Improving retrieval and understanding Building support for understanding change (e.g., DiffIE) Leveraging dynamics for improved retrieval UW, CSE 454, Dec 8 2009
Ch Characterizi aracterizing ng Ch Chan ange ge Cont ntent ent Changes nges 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Large- scale Web crawls, over time Revisited pages 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 55,000 pages crawled hourly for 18+ months Unique users, visits/user, time between visits User er Vis Judged pages (Relevance to a query) isita itatio ion/ n/Re ReVisit Visitat ation ion 6 million pages crawled every two days for 6 months UW, CSE 454, Dec 8 2009
Meas easuring uring We Web b Pag age e Ch Chan ange ge 1 Summary metrics Number of changes 0.8 Time between changes Dice Similarity Amount of change 0.6 Change curves 0.4 Fixed starting point Knot point Measure similarity over 0.2 different time intervals 0 Time e from starti rting ng point UW, CSE 454, Dec 8 2009
Measuring easuring Wi Within thin-Page Page Ch Chan ange ge DOM structure changes Term use changes Divergence from norm cookbooks salads cheese ingredient bbq “Staying power” in page Sep. Oct. Nov. Dec. Time UW, CSE 454, Dec 8 2009
Ex Exam ampl ple Te Term Lon m Longe gevity ity Gr Grap aphs hs UW, CSE 454, Dec 8 2009
Revisitation Re visitation on on th the Web e Web Revisitation patterns Cont ntent ent Changes nges Log analyses Toolbar logs for revisitation Query logs for re-finding 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 User survey to understand intent in revisitations 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Users ers Revisit isit What’s the last Web page you visited? UW, CSE 454, Dec 8 2009
Meas easuring uring Re Revisitation visitation 1 Summary metrics Unique visitors 0.8 Visits/user Normalized Count Time between visits 0.6 Revisitation curves 0.4 Histogram of revisit intervals 0.2 Normalized 0 Time Interval UW, CSE 454, Dec 8 2009
Fo Four ur Re Revisitation isitation Pat atterns terns Fast Hub-and-spoke Navigation within site Hybrid High quality fast pages Medium Popular homepages Mail and Web applications Slow Entry pages, bank pages Accessed via search engine UW, CSE 454, Dec 8 2009
Sea earch rch an and d Re Revis visitation itation Repeat query (33%) Repeat New microsoft research Click Click Repeat Repeat Repeat click (39%) 33% 33% 29% 4% Query Query http://research.microsoft.com New New 67% 67% 10% 57% Q: microsoft research, msr … Query Query Big opportunity (43%) 39% 61% 24% “navigational revisits” UW, CSE 454, Dec 8 2009
Re Repeat t Cl Clicks ks for Re Repeat Queries Within session: Repeat query -> New click Across sessions: Repeat query -> Repeat click UW, CSE 454, Dec 8 2009
Re Relat ations ionships hips Be Betwe ween en Re Revisitatio isitation an and Cha d Chang nge Cont ntent ent Changes nges 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Why did you revisit the last Web page you revisited? UW, CSE 454, Dec 8 2009
Pos ossible sible Re Relationships ationships Interested in change Monitor Effect change Transact Change unimportant Find new Change can interfere Re-find UW, CSE 454, Dec 8 2009
Un Understa derstanding nding the he Re Relationship ationship Compare summary metrics Revisits: Unique visitors, visits/user, interval Change: Number, interval, Dice Number er of Time e between en Dice coefficie ficient nt changes ges changes ges 2 visits/user 2 visits/user 172.91 91 133.26 26 0.82 3 visits/user 3 visits/user 200.51 51 119.24 24 0.82 4 visits/user 4 visits/user 234.32 32 109.59 59 0.81 5 or 6 visits/user 5 or 6 visits/user 269.63 63 94.54 0.82 7+ visits/user 7+ visits/user 341.43 43 81.80 0.81 UW, CSE 454, Dec 8 2009
Compa mparing ring Change nge and Revisi isit t Curves ves Three pages NYT NYT NYT NYT Woot Woot Woot Woot Costco Costco Costco Costco 1.2 1.2 1.2 1.2 New York Times Woot.com 1 1 1 1 Costco 0.8 0.8 0.8 0.8 Similar change patterns 0.6 0.6 0.6 0.6 Different revisitation 0.4 0.4 0.4 0.4 NYT: Fast (news, forums) Woot: Medium 0.2 0.2 0.2 0.2 Costco: Slow (retail) 0 0 0 0 Time UW, CSE 454, Dec 8 2009
Wi Within thin-Page Page Re Relationship ationship Page elements change at different rates Pages revisited at different rates • “Resonance” can serve as a filter for interesting content UW, CSE 454, Dec 8 2009
UW, CSE 454, Dec 8 2009
UW, CSE 454, Dec 8 2009
UW, CSE 454, Dec 8 2009
Dy Dynamics namics of of In Information ormation Characterizing change Content changes over time People re-visit and re-find Relationships between content change and re-access Improving retrieval and understanding Building support for understanding change (e.g., DiffIE) Leveraging dynamics for improved retrieval UW, CSE 454, Dec 8 2009
Recommend
More recommend