20 years of Web search where to next? Mark Sanderson Who am I? - PowerPoint PPT Presentation

20 years of Web search – where to next? Mark Sanderson

Who am I? • Professor at RMIT University, Melbourne • Before – Professor at University of Sheffield – Researcher at UMass Amherst – Researcher at University of Glasgow • Online – @IR_oldie – http://www.seg.rmit.edu.au/mark/ 2

Overview of talk • A bit of history

A bit of history Early IR

Before IR systems • There were libraries – The search engine of the day • Organise information using a subject catalogue – Sort cards by author – Sort cards by title – Sort cards by subject – How to do this? 5

Not just public libraries • MIT Masters thesis, Philip Bagley, 1951 6

At the same time… • While librarians were coping with the information explosion – Could machines help? – Could computers help? • Very brief history of machines and computers for search 7

Machines doing IR CS&IT - ISAR 8

As we may think – Bush 1945 – http://www.youtube.com/watch?v=c539cK58ees 9

Computers doing IR • Holmstrom 1948 10

Information Retrieval • Calvin Mooers, 1950 11

NRT • See demo shown in talk at – http://www.seg.rmit.edu.au/mark/demos/NRT/NRT%20demo.htm • Paper at – http://www.seg.rmit.edu.au/mark/cv/publications/papers/my_papers/EP-odd.pdf 12

The web arrived • 1993 – JumpStation – Jonathon Fletcher, University of Stirling • Steinberg, Wired, 1996 –“ Information retrieval is really only a problem for people in library science - if some computer scientists were to put their heads together, they'd probably have it solved before lunchtime. ”

Where are we now Google/Bing

Where we are now • Google/Bing – Text matching – Fields, anchor – PageRank – Query logs –… – Massive machine learning – Evaluation – Continual tuning

Search is solved? • Common perception 16

Favourable conditions • Most content wants to be found • Most content is redundant • Huge income • Queries often repeated • Users can read & write 17

Where to next? • Immediate problems • Immediate opportunities • Medium term challenges • Longer term challenges

Immediate Problems/opportunies

Problematic summaries 20

Less favourable? • People struggle to search • People miss retrieved documents – Fine for redundant content; what if just one? 21

Problem searching • Limited redundancy – Little money – Enterprise search – Refinding –Content doesn’t want to be found – Patent search – Legal document search (e-Discovery) 22

Enterprise search • Many problems in this space • Each collection is different – Each search engine needs to be different • No money •“Why doesn’t it work like Google?” 23

Significant problem • Think carefully before including search in your user interface 24

At RMIT • Trying to scope the problem – If we find a search solution that works on one set of documents, does it work on others? – Not as much as was thought – A lot worse than was thought 25

Major immediate challenge • Do search as well as Google no matter what the collection, and do it without all their money 26

Favourable conditions • Most content wants to be found • Most content is redundant • Huge income • Queries often repeated • Users can read & write 27

Refinding • Interviewed 45 searchers about common retrieval tasks – 70% relate to refinding • Starting funded investigation in this area. 28

Ephemeral & archival content • Archival – Traditional web search – Web pages, news, documents – Coarse grained • Ephemeral – Social media – Blogs, social networks, micro-blogs – Fine grained 29

Interface of the two • Summarising ephemeral content – Only just starting – Lots of opportunities to specialise • How can ephemeral content aid search of archival – RMIT changing representation of archival content based on ephemeral data. – Early days, but promising 30

Medium term

Diffuse information

Harder information needs • Entertain me • Contextual search • SWIRL 2012 – http://www.cs.r mit.edu.au/swirl 12/ 33

Longer term

Longer term • Long queries • Spoken search • The internet for everyone

Users have complex needs • Poorly expressed in short queries – Experts – issue multiple short queries – use search engine operators • Can we build search engines to handle complex queries? 36

New application area? • Speech search – Hand free – Eyes free • Seen in the movies, but really? 37

Users? • Visually impaired – Together they could form a country • Other potential uses – In car searching – Walking in a city 38

Internet for everyone – http://www.onbile.com/info/how-many-people-use-smartphones-in-the-world/ 39

Internet users? • 2013 – 2 billion now • 2015 – 4 billion mostly on mobiles (Baird Equity Research) 40

Implications? • More languages • More users who struggle with literacy – Search engines assume you can read and write 41

Search engines There is a lot still to do

20 years of Web search where to next? Mark Sanderson Who am I? - PowerPoint PPT Presentation

20 years of Web search where to next? Mark Sanderson Who am I? Professor at RMIT University, Melbourne Before Professor at University of Sheffield Researcher at UMass Amherst Researcher at University of Glasgow Online

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Web CS490W: Web I nformation Search & Management Web opened the door for many important

Web Data Representation Web Graph, Text, Images, Metadata, Search spaces Web Search 1 The Web

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

EE 6882 Visual Search Engine Lec. 1: Introduction tinyeye, photo copy search Web image search

Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web

Link-based Web Search Web Search PageRank HITS Stability Issues Current

Web Search Ranking (COSC 488) Nazli Goharian nazli@cs.georgetown.edu 1 Evaluation of Web

Web Services Serge Abiteboul INRIA-Futurs Web services 2002 1 Abstract Web services

D r o u g h 33.6 -45.5 33.6 -45.5 11 years 5.4 years 11 years 5.4 years years t

and Information System Devy Schonfeld Turn off your cell phones an Housekeeping put them

INFO 4300 / CS4300 Information Retrieval IR 13: Web history Paul Ginsparg Cornell University,

Model 2 approach to JSP Servlet/JSP Integration 4 Dispatching Requests First, call the

Using the JSP Standard Tag Library (JSTL) with JSF Berner Fachhochschule-Technik und Informatik

CMPSC443 - Introduction to Computer and Network Security Module: EMail Secuirty Professor

Machine Learning Machine Learning: algorithms that use experience to improve their

Administrivia CS 188: Artificial Intelligence Reminder: Spring 2006 Drop-in Python/Unix

IP Reputation Analysis of Public Databases and Machine Learning Techniques Jared Lee Lewis

Sambuz

Useful Links

Newsletter

Mail Us

20 years of Web search where to next? Mark Sanderson Who am I? - PowerPoint PPT Presentation

20 years of Web search where to next? Mark Sanderson Who am I? Professor at RMIT University, Melbourne Before Professor at University of Sheffield Researcher at UMass Amherst Researcher at University of Glasgow Online

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Lecture 1: Semantic Web and RDF Aidan Hogan aidhog@gmail.com THE WEB The Web is now 26 years

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Web CS490W: Web I nformation Search &amp; Management Web opened the door for many important

Web Data Representation Web Graph, Text, Images, Metadata, Search spaces Web Search 1 The Web

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

EE 6882 Visual Search Engine Lec. 1: Introduction tinyeye, photo copy search Web image search

Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web

Link-based Web Search Web Search PageRank HITS Stability Issues Current

Web Search Ranking (COSC 488) Nazli Goharian nazli@cs.georgetown.edu 1 Evaluation of Web

Web Services Serge Abiteboul INRIA-Futurs Web services 2002 1 Abstract Web services

D r o u g h 33.6 -45.5 33.6 -45.5 11 years 5.4 years 11 years 5.4 years years t

and Information System Devy Schonfeld Turn off your cell phones an Housekeeping put them

INFO 4300 / CS4300 Information Retrieval IR 13: Web history Paul Ginsparg Cornell University,

Model 2 approach to JSP Servlet/JSP Integration 4 Dispatching Requests First, call the

Using the JSP Standard Tag Library (JSTL) with JSF Berner Fachhochschule-Technik und Informatik

CMPSC443 - Introduction to Computer and Network Security Module: EMail Secuirty Professor

Machine Learning Machine Learning: algorithms that use experience to improve their

Administrivia CS 188: Artificial Intelligence Reminder: Spring 2006 Drop-in Python/Unix

IP Reputation Analysis of Public Databases and Machine Learning Techniques Jared Lee Lewis

Sambuz

Useful Links

Newsletter

Mail Us

Web CS490W: Web I nformation Search & Management Web opened the door for many important