caim cerca i an lisi d informaci massiva
play

CAIM: Cerca i Anlisi dInformaci Massiva FIB, Grau en Enginyeria - PowerPoint PPT Presentation

CAIM: Cerca i Anlisi dInformaci Massiva FIB, Grau en Enginyeria Informtica Slides by Marta Arias, Jos Luis Balczar, Ramon Ferrer-i-Cancho, Ricard Gavald Department of Computer Science, UPC Fall 2020 http://www.cs.upc.edu/~caim


  1. CAIM: Cerca i Anàlisi d’Informació Massiva FIB, Grau en Enginyeria Informàtica Slides by Marta Arias, José Luis Balcázar, Ramon Ferrer-i-Cancho, Ricard Gavaldá Department of Computer Science, UPC Fall 2020 http://www.cs.upc.edu/~caim 1 / 11

  2. 0. Presentation

  3. COVID 19 ◮ Follow the instructions that FIB has sent to you. ◮ Sit always of the same place. ◮ Write your row and column somewhere so that you can remember it. 3 / 11

  4. Instructors ◮ Ramon Ferrer-i-Cancho (lectures + exercices 10 & 20; lab 12) ◮ rferrericancho@cs.upc.edu ◮ Omega S124, 93 413 4028 ◮ Ignasi Gómez (lab 11, 21 & 22) ◮ ignasi.gomez@upc.edu ◮ Javier Béjar (lab 13) ◮ bejar@cs.upc.edu ◮ Omega 204, 93 413 7879 4 / 11

  5. Class Logistics ◮ Fridays, 12–14 (A6E01), 15–17 (A6E02) ◮ Theory and exercises. Often, exercises will be proposed in advance. ◮ Thursdays, lab sessions ◮ Guided lab activities; expected to be complemented with an average estimate of 2 additional hours per session of autonomous work. ◮ Some lab sessions will finish by handing in a short written report; these count towards the evaluation of the course. 5 / 11

  6. Lab work - important rules ◮ Lab is done in pairs. Exceptions must have prior permission ◮ This semester: keep the same partner for the whole semester (see instructions at Racó). ◮ Do not exchange information with others, other than general ideas; that will be considered plagiarism 6 / 11

  7. Exercises ◮ In class, we will solve only a part of the exercises proposed ◮ You are strongly encouraged to try and solve the rest of the exercises ◮ Self-study: One or more small topics will not be explained in class. They will appear in the exam. 7 / 11

  8. Evaluation ◮ Evaluation: as per “Guia Docent” ◮ Parcial 1 (P1): November 5 16:00-17:30 (during week for partial exams), Parcial 2 (P2): 11/01/2021 15:00-18:00 ◮ On the day of Parcial 2 you may choose to do instead a final exam (F) on the whole course ◮ 40 % Lab + max(30 % P1 + 30 % P2, 60 % F) 8 / 11

  9. Contents I First half (until midterm): ◮ Core Information Retrieval: ◮ Introduction: Concept. The IR process ◮ Information Retrieval Models ◮ Indexing and Searching, Implementation ◮ Information Retrieval Evaluation, Feedback Models ◮ Web Search: ◮ Link analysis: Page Rank ◮ Crawling the web ◮ Architecture of a Web search system 9 / 11

  10. Contents II Second half: ◮ The “Big Data” Slogan ◮ Architecture of large-scale web search systems ◮ The Map-Reduce paradigm ◮ Introduction to NoSQL databases ◮ The Apache ecosystem for web search. ◮ Social Network Analysis: ◮ Characterizing of real complex networks ◮ Communities, influence, information diffusion ◮ Clustering and Locality Sensitive Hashing ◮ Recommender Systems 10 / 11

  11. Bibliography ◮ R. Baeza-Yates, B. Ribeiro-Neto: Modern Information Retrieval (2nd ed.). Addison Wesley, 2010. ◮ I.H. Witten, A. Moffat, T. Bell: Managing Gigabytes. Morgan Kaufmann, 1999. ◮ C.D. Manning, P . Raghavan, H. Schütze: Introduction to Information Retrieval. Cambridge 2008. ◮ Z. Markov, D.T. Larose: Data Mining the Web. Wiley, 2007. ◮ Russell, Matthew , Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Site. O’Reilly , 2011 ◮ . . . There’s a whole web out there 11 / 11

Recommend


More recommend