CS226 Big-Data Management Instructor: Ahmed Eldawy 09/28/2018 1
Welcome (back) to UCR! 09/28/2018 2
Class information Classes: Monday, Wednesday, Friday 2:10 – 3:00 PM at WCH 142 Instructor: Ahmed Eldawy Office hours: Monday & Wednesday 4:00-5:00 PM @357 WCH. Conflicts? Website: http://www.cs.ucr.edu/~eldawy/18FCS226/ iLearn (Any UCRX students?) Email: eldawy@ucr.edu Subject: “[CS226] …” 09/28/2018 3
Course work Active participation in the class (5%) Reading and review tasks (10%) Class presentation (15%) Assignments (20%) Project (50%) 09/28/2018 4
Project Groups of 3-4 students Group Selection Project proposal Literature survey Report outline Final report Project presentation 09/28/2018 5
Course goals What are your goals? Understand what big data means Identify the internal components of big data platforms Recognize the differences between different big data platforms Explain how a distributed query runs on big data 09/28/2018 6
Super Hero 09/28/2018 7
Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry out your task 09/28/2018 8
Syllabus Overview of big data Big-data storage Big-data processing Big-data indexing Big-SQL processing Programming packages 09/28/2018 9
Introduction 09/28/2018 10
09/28/2018 11
09/28/2018 12
Jan 2012: World Economic Forum Report 09/28/2018 13
Interest in Big Data in the US ■ June 2013: ■ March 2012: Obama administration Washington unveils BIG DATA initiative: $200 Million Post is calling in R&D investment Obama “ The Big Data President ” 09/28/2018 14
Interest in Big Data in Europe March 2014: David Cameron and Angela Merkel talking about Big Data in a Computer Expo in Hannover, Germany 09/28/2018 15
The Market of Big Data 09/28/2018 16
Four Three V’s of Big Data 09/28/2018 17
Big Data Vs Big Computation Full scans (e.g., log processing) Range scans Point lookups Iterations Joins (self, binary, or multiway) Proximity queries Closures and graph traversals 09/28/2018 18
Big Data Applications Web search Marketing and advertising Data cleaning Knowledge base Information retrieval Internet of Things (IoT) Visualization Behavioral studies 09/28/2018 19
Publicly Available Datasets Data.gov Data.gov.uk Twitter Streaming API Yahoo! Webscope [http://webscope.sandbox.yahoo.com/] GDELT [http://www.gdeltproject.org/] Instagram API 09/28/2018 20
Big Data Landscape 2012 http://mattturck.com/2012/06/29/a-chart-of-the-big-data-ecosystem/ 09/28/2018 21
Big Data Landscape 2014 http://mattturck.com/2014/05/11/the-state-of-big-data-in-2014-a-chart/ 09/28/2018 22
Big Data Landscape 2016 http://mattturck.com/2016/02/01/big-data-landscape/ 09/28/2018 23
Big Data Landscape 2018 09/28/2018 24
Recommend
More recommend