Big Data Storage Technologies James Lee The George Washington - PowerPoint PPT Presentation

Sep 13, 2022 •577 likes •705 views

Introduction AFS GFS and Hadoop Amazon S3, Dynamo, and Cassandra Big Data Storage Technologies James Lee The George Washington University April 11, 2012 James Lee Big Data Storage Technologies Introduction AFS GFS and Hadoop Amazon S3,

Introduction AFS GFS and Hadoop Amazon S3, Dynamo, and Cassandra Big Data Storage Technologies James Lee The George Washington University April 11, 2012 James Lee Big Data Storage Technologies
Introduction AFS GFS and Hadoop Amazon S3, Dynamo, and Cassandra What is Big Data? ◮ When the size of the data grows to become as big of a problem to store and process as the problem you are trying to solve with the data. James Lee Big Data Storage Technologies
Introduction AFS GFS and Hadoop Amazon S3, Dynamo, and Cassandra Why are traditional filesystem insufficient? ◮ Upper limit on filesystem size ◮ Limited redundancy ◮ Limited bandwidth James Lee Big Data Storage Technologies
Introduction AFS GFS and Hadoop Amazon S3, Dynamo, and Cassandra So what are the options for scaling out? ◮ Depends on business needs. ◮ Scale within a rack, within a datacenter, or across wide-area networks. ◮ Several different technologies available for achieving those goals. ◮ May have to make compromises in places. James Lee Big Data Storage Technologies
Introduction AFS GFS and Hadoop Amazon S3, Dynamo, and Cassandra Andrew File System ◮ Distributed filesystem developed in 1980s. ◮ Used primarily by Universities. ◮ Has traditional filesystem semantics. ◮ Scales to hundreds of terabytes. James Lee Big Data Storage Technologies
Source: http: // caligari. dartmouth. edu/ classes/ afs/ print_ pages. shtml
Source: http: // caligari. dartmouth. edu/ classes/ afs/ print_ pages. shtml
Introduction AFS GFS and Hadoop Amazon S3, Dynamo, and Cassandra What does Google do? Look at Google’s requirements: ◮ hundreds of millions of huge files ◮ have to be read very quickly ◮ writes less important ◮ have to be redundant, but not synchronous ◮ concurrent access to files should have low overhead These ideas have been implemented in the Apache Hadoop project. James Lee Big Data Storage Technologies
Introduction AFS GFS and Hadoop Amazon S3, Dynamo, and Cassandra Hadoop ◮ Written in Java (no filesystem semantics) ◮ Stores files in large blocks (64 MB) that get lazily-replicated ◮ Rack-aware replication ◮ Master ‘NameNode’ tracks location of blocks ◮ Writes only optimized for appending data ◮ Scales to tens of thousands of nodes; > 100 PB James Lee Big Data Storage Technologies
Source: http: // arst. ch/ s9l
Introduction AFS GFS and Hadoop Amazon S3, Dynamo, and Cassandra Amazon has very different requirements than a search engine: ◮ Willing to compromise on data consistency across system for HA ◮ Deal with more general-purpose data access ◮ Handle random access to smaller components Amazon developed their own distributed FS called Dynamo. James Lee Big Data Storage Technologies
Introduction AFS GFS and Hadoop Amazon S3, Dynamo, and Cassandra Dynamo ◮ Decentralized, peer-to-peer architecture. ◮ System determines node to select by MD5 hash. ◮ Nodes always query neighbors for latest version. ◮ Implemented in Apache Cassandra project. Source: http: // arst. ch/ s9l James Lee Big Data Storage Technologies

Recommend

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data Analytics Analysis Big Data Big Value Real world Question Data Model Conclusion Machine Learning Use real data to train a model, which can

625 views • 27 slides

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data algorithms Clinical Big Data Our new algorithms Small data vs. Big data Small data vs. Big data VS Small data vs. Big

922 views • 57 slides

DSS Data & Storage Services Handling Big Data an overview of mass storage technologies

DSS Data & Storage Services Handling Big Data an overview of mass storage technologies ukasz Janyst CERN IT Department GridKA School 2013 CH-1211 Genve 23 Switzerland Karlsruhe, 26.08.2013 www.cern.ch/i t Data & What is Big

1.34k views • 34 slides

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

APACHE BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE FIRST SPARK-BASED BIG DATA PLATFORM RELEASED After almost a decade developing Big Data projects in Paradigma, through its R+D department we

640 views • 34 slides

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department | Colorado State University CS535 BIG DATA PART A. BIG DATA TECHNOLOGY 1. INTRODUCTION TO BIG DATA What is Big Data? Sangmi Lee Pallickara

569 views • 7 slides

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by Prof. Dan Ariely, Duke University 2 What is big data? No standard definition! Wikipedia: Big data is a field that treats ways to

1.47k views • 53 slides

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE Presenters Name Presenters Title Presenters Company 1 1 Sun Storage 7000 Unified Storage Systems Agenda The Sun Storage 7000 Unified

517 views • 48 slides

Scalable Learning Technologies Scalable Learning Technologies for Big Data Mining for Big Data

DASFAA 2015 Hanoi Tutorial DASFAA 2015 Hanoi Tutorial Scalable Learning Technologies Scalable Learning Technologies for Big Data Mining for Big Data Mining Gerard de Melo, Tsinghua University Gerard de Melo, Tsinghua University

1.55k views • 152 slides

Cloud storage state of affairs Storage clusters contain thousands of storage nodes, with e.g. 500

A mathematical theory of distributed storage Dagstuhl workshop (16321) Coding in the time of Big Data Michael Luby August 8, 2016 Research Cloud storage state of affairs Storage clusters contain thousands of storage nodes, with e.g. 500 TB

796 views • 37 slides

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data Management summit 24 nd March 216 Philippe Marie-Jeanne Group CDO & Head of the Data Innovation Lab Philippe.mariejeanne@axa.com Big Data is an

449 views • 10 slides

SUSE Enterprise Storage 6 Darren Soothill EMEA Storage Technical Strategist Agenda

SUSE Enterprise Storage 6 Darren Soothill EMEA Storage Technical Strategist Agenda Enterprise Storage Business Challenges SUSE Enterprise Storage Architecture SUSE Enterprise Storage 6 SUSE Enterprise Storage Use Cases

681 views • 22 slides

Solar Plus Storage Solar Plus Storage Focus on Storage Benefits Focus on Storage Benefits by

Solar Plus Storage Solar Plus Storage Focus on Storage Benefits Focus on Storage Benefits by Tom Rust by Tom Rust trust@custompowersolar.com trust@custompowersolar.com 1 Getting to 100% renewables Getting to 100% renewables We cannot

818 views • 52 slides

Hybrid SAN & Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

HIKVISION Storage Products Hybrid SAN & Cluster Enterprise Network Storage Hikvision Enterprise Network Storage History CONTENTS Network Storage Solutions Network Storage Products Network Storage Cases HIK Enterprise Network

739 views • 45 slides

INF5470 Fall 2012 Lecture 10: Analog Storage Content Overview Volatile Short Term Storage

INF5470 Fall 2012 Lecture 10: Analog Storage Content Overview Volatile Short Term Storage Indirect Multi-Level (AD/DA Digital) Storage Direct Multi-Level Storage Non-Volatile Storage Weekly Questions Lecture 10: Analog Storage 2

326 views • 18 slides

CS535 Big Data 3/9/2020 Week 8-A Sangmi Lee Pallickara CS535 Big Data | Computer Science |

CS535 Big Data 3/9/2020 Week 8-A Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University CS535 BIG DATA FAQs Quiz #3-5 Consider a Cassandra storage cluster with 5 storage nodes (A, B, C, D, and E) and an

458 views • 9 slides

of Big Data 10/01/2018 25 Storage of Big Data Data is growing faster than Moores Law Too

Components of Big Data 10/01/2018 25 Storage of Big Data Data is growing faster than Moores Law Too much data to fit on a single machine Partitioning Replication Fault-tolerance 10/01/2018 26 Hadoop Distributed File System (HDFS)

460 views • 16 slides

www.hdtsoccer.com All teams will wear the dynamo uniform with logos and the club has all rights

The competitive programs will be combined under the Houston Dynamo name. Recreational Programs will not be affected. Both VSC and Revolution will retain their existing recreational programs. www.hdtsoccer.com All teams will wear the dynamo

506 views • 8 slides

A Global Model Investigation of MJO Initiation for DYNAMO Guang Zhang Scripps Institution of

A Global Model Investigation of MJO Initiation for DYNAMO Guang Zhang Scripps Institution of Oceanography Subramanian and Zhang (2014, JGR) Objectives To investigate the MJO initiation in the Indian Ocean using the NCAR CAM3 and the DYNAMO

653 views • 21 slides

Vortex Dynamos Steve Tobias (University of Leeds) Stefan Llewellyn Smith (UCSD) An introduction

Vortex Dynamos Steve Tobias (University of Leeds) Stefan Llewellyn Smith (UCSD) An introduction to vortices Vortices are ubiquitous in geophysical and astrophysical fluid mechanics (stratification & rotation). Coherent structures

729 views • 27 slides

Westminster Area Redevelopment Plan Advisory Committee Meeting 1 - March 16, 2017 Meeting

Westminster Area Redevelopment Plan Advisory Committee Meeting 1 - March 16, 2017 Meeting Preview Introductions WARP Advisory Committee Background and Role Establish Foundations for Successful Meetings All about Area

719 views • 40 slides

Europ ean Solar Magnetometry Net w ork Rob ert J. Rutten Sterrekundig Instituut

Europ ean Solar Magnetometry Net w ork Rob ert J. Rutten Sterrekundig Instituut Utrecht http://www.fys.ruu.nl/~rutten WHY? Sola r magnetism generated b y enigmatic dynamo p ro cesses in the sola r interio r

278 views • 3 slides

Systems Berlin, MD DEAN KROMER Electronic Transaction Systems Headquarters in Ashburn,

Electronic Transaction Systems Berlin, MD DEAN KROMER Electronic Transaction Systems Headquarters in Ashburn, Virginia Clients in 23 countries E-Commerce for consumers and businesses PCI Compliant Products/Services Credit

553 views • 7 slides

PEREGO CARS PORSCHE 356 B SC Summary Engine Year 01.1963 4 cyl. 1582 cm3 Gearbox Mileage

Address: Route Suisse 2, 1163 Etoy CH / Phone: 0041218698911 / www.peregocars.com PEREGO CARS PORSCHE 356 B SC Summary Engine Year 01.1963 4 cyl. 1582 cm3 Gearbox Mileage 62120 km 4-speed manual Chassis No Car Type Coup 123 109

558 views • 9 slides

Mo nito ring a nd a dviso ry se rvic e s F a c ilitie s upke e p F a c ilitie s de ve lo pme

Mo nito ring a nd a dviso ry se rvic e s F a c ilitie s upke e p F a c ilitie s de ve lo pme nt E aste r n Distr ic t Cultur al Squar e - T O E xpe rie nc e Sha ring F ACO K O Billy T SOI T O(A)/ AB1/ 14 T O(A)/ AB1/ 13 T

651 views • 27 slides