INFO 4300 / CS4300 Information Retrieval IR 13: Web history Paul - PowerPoint PPT Presentation

INFO 4300 / CS4300 Information Retrieval IR 13: Web history Paul Ginsparg Cornell University, Ithaca, NY 19 Oct 2009 1 / 9

Administrativa Assignment 3: available Fri 22 Oct, due Sun 7 Nov Discussion 4 (28 Oct): Read and be prepared to discuss Sergey Brin and Lawrence Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine. Seventh International World Wide Web Conference. Brisbane, Australia, 1998. http://www7.scu.edu.au/1921/com1921.htm (Note: second copy [with photos of authors] available at http://www-db.stanford.edu/˜backrub/google.html ) 2 / 9

Sometime in past month... 3 / 9

Overview Intro to Web Search 1 4 / 9

Outline Intro to Web Search 1 5 / 9

Brief History Note: WorldWideWeb � = Internet 1945 Memex, V.Bush. one of many hypertext forerunners 1989 Berners-Lee, CERN, global hyperspace idea 1990 WorldWideweb.app on NeXT computer 1991 CERN server/client released in summer of ’91, http protocol and html coded pages, also linemode browser (lynx) 1991–1994 growth, mainly in Europe: First U.S. website at Stanford Linear Accelerator Center (1992). Spring ’93 Mosaic client (NCSA), added in-line graphics, also produced its own version of httpd server software. CERN still maintained a list of “all webservers in world”. 6 / 9

Brief History, cont’d Early ’94: crawlers like “jumpstation”, and “WWW Worm” (McBryan 1994, 110,000 pages and 1500 queries per day in Mar/Apr ’94). Nov 97: 2M-100M docs (expected 1B by 2000). Altavista handled 20M queries/day. 2000: expected 100’s of million/day . (Actual 2004 Google was 4.2B pages. In 2005 Yahoo and Google each claimed to have indexed upwards of 15B pages, then stopped posting their claimed counts.) 7 / 9

But ... If 10 times the number of pages meant every query brings up 10 times as many results to sort through, then search engine methodology doesn’t scale with size of web — perhaps it only worked because the amount of material on the web was still so small ? But there’s a set of heuristics for ordering the search results, so that the desired page is frequently ranked in the top ten, and it doesn’t matter that there are many thousands of other pages retrieved. 8 / 9

Historical antecedents Page Rank methodology stems from long history of citation analysis, where “link” is some signal of recommendation (or popularity). Based on property of graph (i.e., query-independent), hence efficient for serving a large volume of queries. Underlying Markov process also not new, but was applied in a particularly powerful way ( ⇒ unexpected power of simple algorithms and ample computing power applied to massive datasets) 9 / 9

INFO 4300 / CS4300 Information Retrieval IR 13: Web history Paul - PowerPoint PPT Presentation

INFO 4300 / CS4300 Information Retrieval IR 13: Web history Paul Ginsparg Cornell University, Ithaca, NY 19 Oct 2009 1 / 9 Administrativa Assignment 3: available Fri 22 Oct, due Sun 7 Nov Discussion 4 (28 Oct): Read and be prepared to

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

Model 2 approach to JSP Servlet/JSP Integration 4 Dispatching Requests First, call the

Using the JSP Standard Tag Library (JSTL) with JSF Berner Fachhochschule-Technik und Informatik

Apache MyFaces Open Source JavaServer Faces Matthias Weendorf @ ApacheCon US 2005 San Diego,

Democracy in Theory and Practice 1 The Politics of Confrontation in1950s - Liberal/Democratic

and Information System Devy Schonfeld Turn off your cell phones an Housekeeping put them

20 years of Web search where to next? Mark Sanderson Who am I? Professor at RMIT

CMPSC443 - Introduction to Computer and Network Security Module: EMail Secuirty Professor

Machine Learning Machine Learning: algorithms that use experience to improve their

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us