Massive Data Algorithmics Lecture 1: Introduction Massive Data - PowerPoint PPT Presentation

Introduction Models Massive Data Models Massive Data Algorithmics Lecture 1: Introduction Massive Data Algorithmics Lecture 1: Introduction

Introduction Massive Data Models Examples Massive Data Models Massive Data Massive datasets are being collected everywhere Storage management software is billion-dollar industry Massive Data Algorithmics Lecture 1: Introduction

Introduction Massive Data Models Examples Massive Data Models Examples Phone: AT&T 20TB phone call database, wireless tracking Consumer: WalMart 70TB database, buying patterns WEB: Web crawl of 200M pages and 2000M links, Akamai stores 7 billion clicks per day Geography: NASA satellites generate 1.2TB per day Massive Data Algorithmics Lecture 1: Introduction

Introduction Massive Data Models Examples Massive Data Models Grid Terrain Data Appalachian Mountains (800km x 800km) 100m resolution ⇒ ∼ 64M cells ⇒ ∼ 128MB raw data ( ∼ 500MB when processing) ∼ 1.2GB at 30m resolution NASA SRTM mission acquired 30m data for 80% of the earth land mass ∼ 12GB at 10m resolution (much of US available from USGS) ∼ 1.2TB at 1m resolution (selected, mostly military) Massive Data Algorithmics Lecture 1: Introduction

Introduction Massive Data Models Examples Massive Data Models LIDAR Terrain Data Massive (irregular) point sets (1-10m resolution) Appalachian Mountains between 50GB and 5TB Massive Data Algorithmics Lecture 1: Introduction

Introduction Massive Data Models Examples Massive Data Models Application Example: Flooding Prediction Massive Data Algorithmics Lecture 1: Introduction

Introduction Random access Models Hierarchical Memory Massive Data Models Random Access Machine Model Standard theoretical model of computation: Infinite memory Uniform access cost Simple model crucial for success of computer industry Massive Data Algorithmics Lecture 1: Introduction

Introduction Random access Models Hierarchical Memory Massive Data Models Hierarchical Memory Modern machines have complicated memory hierarchy Levels get larger and slower further away from CPU Data moved between levels using large blocks Massive Data Algorithmics Lecture 1: Introduction

Introduction Random access Models Hierarchical Memory Massive Data Models Slow IO Disk access is 10 6 times slower than main memory access The difference in speed between modern CPU and disk technologies is analogous to the difference in speed in sharpening a pencil using a sharpener on ones desk or by taking an airplane to the other side of the world and using a sharpener on someone elses desk. (D. Comer) Disk systems try to amortize large access time transferring large contiguous blocks of data (8-16Kbytes) Important to store/access data to take advantage of blocks (locality) Massive Data Algorithmics Lecture 1: Introduction

Introduction Random access Models Hierarchical Memory Massive Data Models Scalability Problems Most programs developed in RAM-model. Run on large datasets because OS moves blocks as needed Moderns OS utilizes sophisticated paging and prefetching strategies. But if program makes scattered accesses even good OS cannot take advantage of block access Massive Data Algorithmics Lecture 1: Introduction

IO Model Introduction Cache-Oblivious Model Models Streaming Model Massive Data Models Evaluation External Memory Model(Cache-Aware Model) N = # of items in the problem instance B = # of items per disk block M = # of items that fit in main memory T = # of items in output I/O: Move block between memory and disk We assume (for convenience) that M > B 2 Massive Data Algorithmics Lecture 1: Introduction

IO Model Introduction Cache-Oblivious Model Models Streaming Model Massive Data Models Evaluation Fundamental Bounds Internal External Scanning N / B N Sorting N log N N / B log M / B N / B Permuting min ( N , N / B log M / B N / B ) N Searching log N log B N Note: Linear I/O: O ( N / B ) Permuting not linear Permuting and sorting bounds are equal in all practical cases B factor VERY important: N / B < ( N / B ) log M / B ( N / B ) << N Massive Data Algorithmics Lecture 1: Introduction

IO Model Introduction Cache-Oblivious Model Models Streaming Model Massive Data Models Evaluation Cache-Oblivious Model A cache-oblivious algorithm is an algorithm designed to take advantage of a CPU cache without having the size of the cache a cache oblivious algorithm is designed to perform well, without modification, on multiple machines with different cache sizes, or for a memory hierarchy with different levels of cache having different sizes. The idea for cache-oblivious algorithms was conceived by Charles E. Leiserson as early as 1996 and first published by Harald Prokop in his master’s thesis at the Massachusetts Institute of Technology in 1999. Massive Data Algorithmics Lecture 1: Introduction

IO Model Introduction Cache-Oblivious Model Models Streaming Model Massive Data Models Evaluation Streaming Model In stream model, input data are not available for random access from disk or memory, but rather arrive as one or more continuous data streams. Performance of algorithm is measured by three basic factors: Number of passes algorithm must make over stream. The available memory. The running time of the algorithm. Massive Data Algorithmics Lecture 1: Introduction

IO Model Introduction Cache-Oblivious Model Models Streaming Model Massive Data Models Evaluation Grading Midterm: 6 points Final: 6 points 2-3 project+exercise: 3 points Presentation: 2 points Report: 3 points Massive Data Algorithmics Lecture 1: Introduction

Massive Data Algorithmics Lecture 1: Introduction Massive Data - PowerPoint PPT Presentation

Introduction Models Massive Data Models Massive Data Algorithmics Lecture 1: Introduction Massive Data Algorithmics Lecture 1: Introduction Introduction Massive Data Models Examples Massive Data Models Massive Data Massive datasets are

Massive Data Algorithmics Lecture 10: Connected Components and MST Massive Data Algorithmics

Massive Data Algorithmics Lecture 3: External Search Trees Massive Data Algorithmics Lecture 3:

Massive Data Algorithmics Lecture 5: External Search Trees Massive Data Algorithmics Lecture 5:

Massive Data Algorithmics Lecture 6: Interval Trees Massive Data Algorithmics Lecture 6:

Massive Data Algorithmics Lecture 4: External Search Trees Massive Data Algorithmics Lecture 4:

Massive Data Algorithmics Lecture 5: External Search Trees Massive Data Algorithmics Lecture 5:

Massive Data Algorithmics Lecture 11: BFS and DFS Massive Data Algorithmics Lecture 11: BFS and

Massive Data Algorithmics Lecture 7: Range Searching Massive Data Algorithmics Lecture 7: Range

The FIFA Universe Massive scale, massive influence, massive corruption First, Some History.

Massive Data Algorithmics Gerth Stlting Brodal Aarhus University Forskningsdag for

Pedagogical Introduction Algorithmics and C Programming Lecture 0 Karim Bouzoubaa Objective

Algorithmics and C basis Introduction For beginners . . . Definition of algorithm Examples

Multivariate Algorithmics for Voting Britta Dorn University of Ulm, Germany FET11 Britta

Points, Distances, and Cellular Automata: Geometric and Spatial Algorithmics Luidnel Maignan

A different look to massive MIMO Ana Garca Armada Communications Research Group (GCOM)

1 2 Compress a massive object to a small sketch 2 Compress a massive object to a small

Cache Memory Raul Queiroz Feitosa Content Memory Hierarchy Principle of Locality Some

Hardware Hardware Implementation Implementation Pascal Gautron R&D Engineer Thomson

Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architecture Emmanuel Jeannot,

St orage Hierarchy 10: St orage and File Syst em Regist ers Basics L1 Cache Fast er, Smaller,

NOW Handout Page 1 9 Parallel Architecture Framework Scalable Machines What are the design

Distributed and on-demand cache for CMS experiment at LHC Diego Ciangottini on behalf of CMS

Making Good Enough...Better: Addressing the Multiple Objectives of High-Performance Parallel

Stupid !! Andr Seznec 2 Single thread performance Has been driving architecture till

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Massive Data Algorithmics Lecture 1: Introduction Massive Data - PowerPoint PPT Presentation

Introduction Models Massive Data Models Massive Data Algorithmics Lecture 1: Introduction Massive Data Algorithmics Lecture 1: Introduction Introduction Massive Data Models Examples Massive Data Models Massive Data Massive datasets are

Massive Data Algorithmics Lecture 10: Connected Components and MST Massive Data Algorithmics

Massive Data Algorithmics Lecture 3: External Search Trees Massive Data Algorithmics Lecture 3:

Massive Data Algorithmics Lecture 5: External Search Trees Massive Data Algorithmics Lecture 5:

Massive Data Algorithmics Lecture 6: Interval Trees Massive Data Algorithmics Lecture 6:

Massive Data Algorithmics Lecture 4: External Search Trees Massive Data Algorithmics Lecture 4:

Massive Data Algorithmics Lecture 5: External Search Trees Massive Data Algorithmics Lecture 5:

Massive Data Algorithmics Lecture 11: BFS and DFS Massive Data Algorithmics Lecture 11: BFS and

Massive Data Algorithmics Lecture 7: Range Searching Massive Data Algorithmics Lecture 7: Range

The FIFA Universe Massive scale, massive influence, massive corruption First, Some History.

Massive Data Algorithmics Gerth Stlting Brodal Aarhus University Forskningsdag for

Pedagogical Introduction Algorithmics and C Programming Lecture 0 Karim Bouzoubaa Objective

Algorithmics and C basis Introduction For beginners . . . Definition of algorithm Examples

Multivariate Algorithmics for Voting Britta Dorn University of Ulm, Germany FET11 Britta

Points, Distances, and Cellular Automata: Geometric and Spatial Algorithmics Luidnel Maignan

A different look to massive MIMO Ana Garca Armada Communications Research Group (GCOM)

1 2 Compress a massive object to a small sketch 2 Compress a massive object to a small

Cache Memory Raul Queiroz Feitosa Content Memory Hierarchy Principle of Locality Some

Hardware Hardware Implementation Implementation Pascal Gautron R&amp;D Engineer Thomson

Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architecture Emmanuel Jeannot,

St orage Hierarchy 10: St orage and File Syst em Regist ers Basics L1 Cache Fast er, Smaller,

NOW Handout Page 1 9 Parallel Architecture Framework Scalable Machines What are the design

Distributed and on-demand cache for CMS experiment at LHC Diego Ciangottini on behalf of CMS

Making Good Enough...Better: Addressing the Multiple Objectives of High-Performance Parallel

Stupid !! Andr Seznec 2 Single thread performance Has been driving architecture till

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Hardware Hardware Implementation Implementation Pascal Gautron R&D Engineer Thomson