Information Retrieval Information Retrieval Introduction 1 Hamid Beigy Sharif university of technology October 6, 2018 1 Some slides have been adapted from slides of Manning, Yannakoudakis, and Sch¨ utze. Hamid Beigy | Sharif university of technology | October 6, 2018 1 / 18
Information Retrieval Table of contents 1. Course Information 2. Introduction 3. Course overview Hamid Beigy | Sharif university of technology | October 6, 2018 2 / 18
Information Retrieval | Course Information Outline 1 Course Information 2 Introduction 3 Course overview Hamid Beigy | Sharif university of technology | October 6, 2018 3 / 18
Information Retrieval | Course Information Course Information 1 Course name : Modern Information Retrieval 2 Instructor : Hamid Beigy Email : beigy@sharif.edu 3 Course Website: http://ce.sharif.edu/courses/97-98/1/ce324-2/ 4 Lectures: Sat-Mon (10:30-12:00) 5 TAs : Faeze Ghorbanpour Email: f.gorbanpor93@students.sharif.ir Hamid Beigy | Sharif university of technology | October 6, 2018 3 / 18
Information Retrieval | Course Information Course evaluation Evaluation: Mid-term exam 20% 1398/7/28 Mid-term exam 20% 1397/8/28 Final exam 30% Practical Assignments 25% Quiz 10% Hamid Beigy | Sharif university of technology | October 6, 2018 4 / 18
Information Retrieval | Course Information Main Reference Hamid Beigy | Sharif university of technology | October 6, 2018 5 / 18
Information Retrieval | Course Information References R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval . Addison-Wesley Publishing Company, USA, 2nd edition, 2011. G. Kowalski. Information Retrieval Architecture and Algorithms . Springer-Verlag, Berlin, Heidelberg, 1st edition, 2010. C. D. Manning, P. Raghavan, and H. Sch¨ utze. Introduction to Information Retrieval . Cambridge University Press, New York, NY, USA, 2008. Hamid Beigy | Sharif university of technology | October 6, 2018 6 / 18
Information Retrieval | Introduction Outline 1 Course Information 2 Introduction 3 Course overview Hamid Beigy | Sharif university of technology | October 6, 2018 7 / 18
Information Retrieval | Introduction Definition of information retrieval 1 Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). 2 Document Collection: units we have built an IR system over. Documents can be 1 memos 2 book chapters paragraphs 3 scenes of a movie 4 turns in a conversation... 3 These days we frequently think first of web search, but there are many other cases: E-mail search Searching your laptop Corporate knowledge bases Legal information retrieval Hamid Beigy | Sharif university of technology | October 6, 2018 7 / 18
Information Retrieval | Introduction Structured vs Unstructured Data Unstructured data means that a formal, semantically overt, easy-for-computer structure is missing. In contrast to the rigidly structured data used in DB style searching (e.g. product inventories, personnel records) SELECT * FROM business-catalogue WHERE category = ”florist” AND city-zip = ”cb1” This does not mean that there is no structure in the data Document structure (headings, paragraphs, lists. . . ) Explicit markup formatting (e.g. in HTML, XML. . . ) Linguistic structure (latent, hidden) Hamid Beigy | Sharif university of technology | October 6, 2018 8 / 18
Information Retrieval | Introduction Information Needs and Relevance 1 Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). 2 An information need is the topic about which the user desires to know more about. 3 A query is what the user conveys to the computer in an attempt to communicate the information need. 4 Types of information needs 1 Known-item search 2 Precise information seeking search 3 Open-ended search (topical search) Hamid Beigy | Sharif university of technology | October 6, 2018 9 / 18
Information Retrieval | Introduction Structured vs Unstructured data growth Hamid Beigy | Sharif university of technology | October 6, 2018 10 / 18
Information Retrieval | Introduction Relevance 1 A document is relevant if the user perceives that it contains information of value with respect to their personal information need. 2 Are the retrieved documents 1 about the target subject 2 up-to-date? 3 from a trusted source? 4 satisfying the users needs? 3 How should we rank documents in terms of these factors? Hamid Beigy | Sharif university of technology | October 6, 2018 11 / 18
Information Retrieval | Introduction Information Retrieval Basics Document Collection Query IR System Set of relevant documents Hamid Beigy | Sharif university of technology | October 6, 2018 12 / 18
Information Retrieval | Introduction How well has the system performed? The effectiveness of an IR system (i.e., the quality of its search results) is determined by two key statistics about the systems returned results for a query: Precision: What fraction of the returned results are relevant to the information need? Recall: What fraction of the relevant documents in the collection were returned by the system? What is the best balance between the two? Easy to get perfect recall: just retrieve everything Easy to get good precision: retrieve only the most relevant Hamid Beigy | Sharif university of technology | October 6, 2018 13 / 18
Information Retrieval | Introduction A short history of IR 1970s 1990s 1945 1950s 1960s 2000s 1980s T erm Cranfield Salton; IR coined TREC experiments by Calvin VSM Moers memex Boolean IR Literature searching SMART systems; Multimedia evaluation Multilingual pagerank by P&R (CLEF) (Alan Kent) Recommendation 1 Systems recall precision/ recall precision 0 no items retrieved Hamid Beigy | Sharif university of technology | October 6, 2018 14 / 18
Information Retrieval | Introduction IR for non-textual media Hamid Beigy | Sharif university of technology | October 6, 2018 15 / 18
Information Retrieval | Introduction Unstructured data in 1650 Which plays of Shakespeare contain the words Brutus and Caesar , but not Calpurnia ? One could grep all of Shakespeare’s plays for Brutus and Caesar , then strip out lines containing Calpurnia . Why is grep not the solution? Slow (for large collections) grep is line-oriented, IR is document-oriented “ not Calpurnia ” is non-trivial Other operations (e.g., find the word Romans near countryman ) not feasible Hamid Beigy | Sharif university of technology | October 6, 2018 16 / 18
Information Retrieval | Introduction Web Information Retrieval web pages Query IR System Set of relevant web pages Hamid Beigy | Sharif university of technology | October 6, 2018 17 / 18
Information Retrieval | Course overview Outline 1 Course Information 2 Introduction 3 Course overview Hamid Beigy | Sharif university of technology | October 6, 2018 18 / 18
Information Retrieval | Course overview Course overview Introduction Indexing and text operations IR Models ( Boolean, vector space, probabilistic) Evaluation of IR systems Query operations Machine Learning in IR (Classification, clustering, and ranking) Web Information Retrieval Some advanced topics Hamid Beigy | Sharif university of technology | October 6, 2018 18 / 18
Recommend
More recommend