Historic Goals A memex is a device in which an individual stores - PDF document

10/27/10 Historic Goals “ A memex is a device in which an individual stores all his books, records, and communications, and Search Engines which is mechanized so that it may be consulted and with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.” Information Search Vannevar Bush, As we may think, Atlantic Monthly , July 1945. (assigned week 2) “Google's mission is to organize the world's information and make it universally accessible and useful” Google’s mission statement, ~ 1998. Prophetic: Hypertext Vannevar Bush’s 1945 vision • Director of the Office of  "associative indexing, the basic idea of Scientific Research and which is a provision whereby any item may Development (1941-1947) be caused at will to select immediately and • End of WW2 - what next big automatically another. This is the essential challenge for scientists? Vannevar Bush,1890-1974 feature of the memex. The process of tying two items together is the important "This is a much larger matter than merely the thing.” extraction of data for the purposes of scientific research; it involves the entire process by which man profits by his inheritance of acquired knowledge” Prophetic: Wikipedia et al How have we achieved search capability? • "Wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails • Vannevar Bush envisioned personal index running through them, ready to be dropped into the memex and there amplified.” • General open collections – keyword/subject-based search  full-text search  hypertext enhanced search 1

10/27/10 Full-text search: beginnings Think first about text documents • Early digital searches – digital card • Gerald Salton, founding father of catalog: information retrieval – subject classifications, keywords Gerald Salton – SMART retrieval system (1927-1995) • “Full text” : words + English structure – No “meta-structure” • Although search has changed, classic • Classic study techniques still provide foundations – our – Gerald Salton SMART project 1960’s starting point 8 Information Retrieval Information Retrieval cont. • Define each of the words in quotes • User wants information from a collection of “objects”: information need – Information object – Query • User formulates need as a “query” – Satisfying objects – Language of information retrieval system – Useful presentation • System finds objects that “satisfy” query • System presents objects to user in “useful form” • Notion of relevance critical • User determines which objects from among – What really want? those presented are relevant – Insufficient structure for exact retrieval • Develop algorithms for the search and retrieval tasks 9 10 Modeling: “ satisfying ” AND Model • Document: set of terms • What determines if document satisfies query? • Query: set of terms • That depends …. – Document model • Satisfying: – Query model – document satisfies query if all terms of – definition of “satisfying” can still vary query appear in document • START SIMPLE – better understanding – Use components of simple model later Currently used by Web search engines 11 12 2

10/27/10 OR Model Scaling • Document: set of terms • What are attributes changing from 1960’s • Query: set of terms to online searches of today? • Satisfying: – document satisfies query if one or more terms of query appear in document • How do they change problem? Original IR model why? 13 14 Full-text search: beginnings Introducing Ranking • Gerald Salton, founding father of information retrieval Gerald Salton – SMART retrieval system • Order documents that satisfy a query by (1927-1995) • major idea: score documents by how well match the query frequency of words of query occur • How capture relevance to user by – take into account algorithmic method of ordering? • document length • frequency of words in collection • one of many major contributions 15 Frequency Model example Scoring documents (vector model) Doc 1 : “Computers have brought the world to our fingertips. We will try to understand at a basic level the science -- old and new -- underlying frequency- adjusted for this new Computational Universe. Our quest takes us on a broad sweep of scientific knowledge and related technologies… Ultimately, based word value this study makes us look anew at ourselves -- our genome; language; music; "knowledge"; and, above all, the mystery of our intelligence. Doc 1 Doc 2 Doc 1 Doc 2 (cos 116 description) Frequencies: “science” 1 2 .51 1.02 science 1; knowledge 2; principles 0; engineering 0 “engineering” 1 1.6 Doc 2: “An introduction to computer science in the context of scientific, “principles” 1 1.6 engineering, and commercial applications. The goal of the course is to teach basic principles and practical issues, while at the same time “knowledge” 2 3.2 preparing students to use computers effectively for applications in computer science …” (cos 126 description) Combined Frequencies: 3 4 3.71 4.22 SCORE science 2; knowledge 0; principles 1; engineering 1 3

10/27/10 Review Using word occurrence and features • Current collections • word frequency in documents – Millions to trillions of documents & plain text • positions of words in documents petabytes of information • Queries of a few words • appearance in special parts of documents - title • Create index of collection organized by word - abstract – Words sorted alphabetically - section header – Binary search look-up marked-up text - … • Design ranking method • special features of word – Classic: word frequency - bold font – “Mark-up (e.g. HTML): attributes of words in doc. - larger font • word position, appearance, … - … Revisit the Index with ranking in mind Processing a query • Retrieval systems record all info will use about • Find index entry for each word in query document in the index • Each index entry gives list of documents • index organized by word containing word, usually sorted by doc. ID – all words in all documents = lexicon • for each word, index records list of: • Scan through lists in parallel looking for – documents in which it appears SORTED! documents containing all query words - positions at which it occurs in each doc. SORTED! – Sorting makes this linear time! – attributes for each occurrence • Process positions of query words and other • record summary information for documents attributes of query words for documents • record summary information for words containing all query words  Means index about as big as combination of – Allows look at how near different query words are to documents! each other in a document Along came the Web Anchor text • Major new element: links • the words used when making a link to – hypertext another document: “Assignment 1 is now available.” • Early Web search engines did not use links • Add words of anchor text to document – Excite, WebCrawler, Lycos, Infoseek, AltaVista, pointed to Inktomi, Ask Jeeves (now Ask) and more text of • How use links? document Assignment 1 + – anchor text “Assignment 1 ” – link analysis 4

10/27/10 Example Example 2 nd result: 14 th result for search on Google: toxin in URL Link analysis Graph model • Intuition: • Capture relationships between things when Web page points to another Web page, • Nodes represent things ( Web pages) confers status/authority/popularity to that page • Edges between nodes represent that • Find a scoring of pages that captures intuition things are related (links between pages) link – Directed or undirected (directed) page • Many many applications • Studied mathematically • Many algorithms for graph problems PageRank More graph examples • Algorithm that gave Google its “killer” performance • Social network: Larry Page, Sergey Brin, R. Motwani, T. Winograd node = person (1998) edge if friends – directed or undirected? • Random walk model • Rail system: + random leap node = station edge if non-stop train service between stations 1 • Tournament node = contestant / team edge from team A to team B if A beat B - directed 2 4 3 5

Historic Goals A memex is a device in which an individual stores - PDF document

10/27/10 Historic Goals A memex is a device in which an individual stores all his books, records, and communications, and Search Engines which is mechanized so that it may be consulted and with exceeding speed and flexibility. It is an

Hillsdale Historic Resource Survey Historic Maps: 1851 Hillsdale Historic Resource Survey

Design Elements Issue Task Force March 12, 2014 1 Historic Character 2 Historic Character 3

DOWNTOWN LINCOLN Historic Survey DOWNTOWN LINCOLN Historic Survey LINCOLN DOWNTOWN Historic

Kingman Park Historic District (Proposed) Historic Preservation Review Board D.C. Historic

SYLVAN GROVE Historic Survey SYLVAN GROVE Historic Survey https://khri.kansasgis.org/ SYLVAN

The National Register of Historic Places Ian Johnson, Oregon State Historic Preservation Office

The list of threatened historic properties is an annual program of Historic Hawaii Foundation,

A Dark Future For Historic Tax Credits After Historic Boardwalk By Timothy Jacobs After Historic

HISTORIC DUBLIN DECEMBER 8, 2016 NEW DISTRICT: HISTORIC CORE II NEW DISTRICT BSD HISTORIC CORE

South Village Historic District Historic Photographs Historic Tax Photo 1930s 2009

Historic Restoration A Holistic Approach November 2019 Presenters: Anthony Kartsonas &

Hi Historic Kenwood Pu Public Art Pr Project 5/9/17 Historic Kenwood Public Art Project

SCENIC ROADS PROGRAM ADOT PARKWAYS, HISTORIC & SCENIC ROADS PROGRAM Parkways Historic

{ Hulbert Houses Historic District Why is historic district designation being sought for the

Creating a Future for Albertas Historic Places 1 Historic Places/Resources Heritage

Seguin Historic Resources Survey Preliminary Results Presentation JULY 2019 Presentation

Surreptitious Communication CS 161 - Computer Security Profs. Vern Paxson & David Wagner

NLP : 2017 6 26

Plan Course requirements Motivation Resources for Computational Linguists Course

Look-a-likes How Internet Giants Reach the Most Relevant Audience at Scale Moran Gavish,

CS490W Web Search (I ) Luo Si Department of Computer Science Purdue University Slides from

Disintermedia+on 2.0 Librarians and Systems Rory Litwin FIP February 5, 2010 University of

ZONE TO WIN ORGANIZING TO COMPETE IN AN AGE OF DISRUPTION G o t o c o n A c c e l e r a

Hello Alexa, Im Drupal Arash Farazdaghi Builder Track \

Sambuz

Useful Links

Newsletter

Mail Us

Historic Goals A memex is a device in which an individual stores - PDF document

10/27/10 Historic Goals A memex is a device in which an individual stores all his books, records, and communications, and Search Engines which is mechanized so that it may be consulted and with exceeding speed and flexibility. It is an

Hillsdale Historic Resource Survey Historic Maps: 1851 Hillsdale Historic Resource Survey

Design Elements Issue Task Force March 12, 2014 1 Historic Character 2 Historic Character 3

DOWNTOWN LINCOLN Historic Survey DOWNTOWN LINCOLN Historic Survey LINCOLN DOWNTOWN Historic

Kingman Park Historic District (Proposed) Historic Preservation Review Board D.C. Historic

SYLVAN GROVE Historic Survey SYLVAN GROVE Historic Survey https://khri.kansasgis.org/ SYLVAN

The National Register of Historic Places Ian Johnson, Oregon State Historic Preservation Office

The list of threatened historic properties is an annual program of Historic Hawaii Foundation,

A Dark Future For Historic Tax Credits After Historic Boardwalk By Timothy Jacobs After Historic

HISTORIC DUBLIN DECEMBER 8, 2016 NEW DISTRICT: HISTORIC CORE II NEW DISTRICT BSD HISTORIC CORE

South Village Historic District Historic Photographs Historic Tax Photo 1930s 2009

Historic Restoration A Holistic Approach November 2019 Presenters: Anthony Kartsonas &amp;

Hi Historic Kenwood Pu Public Art Pr Project 5/9/17 Historic Kenwood Public Art Project

SCENIC ROADS PROGRAM ADOT PARKWAYS, HISTORIC &amp; SCENIC ROADS PROGRAM Parkways Historic

{ Hulbert Houses Historic District Why is historic district designation being sought for the

Creating a Future for Albertas Historic Places 1 Historic Places/Resources Heritage

Seguin Historic Resources Survey Preliminary Results Presentation JULY 2019 Presentation

Surreptitious Communication CS 161 - Computer Security Profs. Vern Paxson &amp; David Wagner

NLP : 2017 6 26

Plan Course requirements Motivation Resources for Computational Linguists Course

Look-a-likes How Internet Giants Reach the Most Relevant Audience at Scale Moran Gavish,

CS490W Web Search (I ) Luo Si Department of Computer Science Purdue University Slides from

Disintermedia+on 2.0 Librarians and Systems Rory Litwin FIP February 5, 2010 University of

ZONE TO WIN ORGANIZING TO COMPETE IN AN AGE OF DISRUPTION G o t o c o n A c c e l e r a

Hello Alexa, Im Drupal Arash Farazdaghi Builder Track \

Sambuz

Useful Links

Newsletter

Mail Us

Historic Restoration A Holistic Approach November 2019 Presenters: Anthony Kartsonas &

SCENIC ROADS PROGRAM ADOT PARKWAYS, HISTORIC & SCENIC ROADS PROGRAM Parkways Historic

Surreptitious Communication CS 161 - Computer Security Profs. Vern Paxson & David Wagner