CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University CS535 BIG DATA FAQs • Quiz #2 • 2/21 ~ 2/23 • Spark and Storm • 10 questions • 30 minutes PART B. GEAR SESSIONS • Answers will be available at 9PM 2/24 SESSION 1: PETA-SCALE STORAGE SYSTEMS Google had 2.5 million servers in 2016 Sangmi Lee Pallickara Computer Science, Colorado State University http://www.cs.colostate.edu/~cs535 CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University GEAR Session 1. Peta-scale Storage Systems Topics of Todays Class • GEAR Session I. Peta Scale Storage Systems • Lecture 2. • GFS I and II • Cassandra CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Two breaks in the communication lines London Rome Boston GEAR Session 1. peta-scale storage systems Chicago Lecture 2. Google File System and Hadoop Distributed File System 3. Relaxed Consistency LA Paris Miami A single machine can’t partition So it does not have to worry about partition tolerance There is only one node. Sydney If it’s up, it’s available http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 1
CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Eventually consistent GFS has a relaxed consistency model • At any time nodes may have replication inconsistencies • Consistent : See the same data • On all replicas • If there are no more updates (or updates can be ordered), eventually all nodes will be • Defined : If it is consistent AND updated to the same value • Clients see mutation writes in its entirety CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Inconsistent and undefined Consistent but undefined Operation A Operation A Operation B Operation B CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Defined File state region after a mutation Operation A Write Record Append Serial success Defined defined interspersed with Operation B Consistent inconsistent Concurrent but undefined success Failure Inconsistent http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 2
CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University GFS uses leases to maintain consistent mutation order across replicas • Master grants lease to one of the replicas GEAR Session 1. peta-scale storage systems • Primary Lecture 2. Google File System and Hadoop Distributed File System 4. Handling write and append to a file • Primary picks serial-order • For all mutations to the chunk • Other replicas follow this order • When applying mutations CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Lease mechanism designed to minimize Revocation and transfer of leases communications with the master • Lease has initial timeout of 60 seconds • Master may revoke a lease before it expires • As long as chunk is being mutated • Primary can request and receive extensions • If communications lost with primary • Master can safely give lease to another replica • Only After the lease period for old primary elapses • Extension requests/grants piggybacked over heart-beat messages CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Client pushes data to all the replicas [1/2] How a write is actually performed 1. Chunkserver holding the current lease for the chunk and the location of the other replica MASTER • Each chunk server stores data in an LRU buffer until 4. Write request Client • Data is used 3*. 2. Identity of the primary • Aged out and the locations of other replicas Secondary Replica A Primary 5. Write request/ 6. Acknowledgement Replica 7. Final Reply Secondary Replica B 3. Client pushes the data to all the replicas http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 3
CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Data flow is decoupled from the control flow to utilize Client pushes data to all the replicas [2/2] network efficiently • Utilize each machine’s network bandwidth • When chunk servers acknowledge receipt of data • Avoid network bottlenecks • Client sends a write request to primary • Avoid high-latency links • Primary assigns consecutive serial numbers to mutations • Leverage network topology • Forwards to replicas • Estimate distances from IP addresses • Pipeline the data transfer • Once a chunkserver receives some data, it starts forwarding immediately. • For transferring B bytes to R replicas • Ideal elapsed time will be ≈ B/T+RL where: • T is the network throughput • L is latency to transfer bytes between two machines CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Append: Record sizes and fragmentation Inconsistent Regions • Size is restricted to ¼ the chunk size Data 1 Data 1 Data 1 • Maximum size Data 2 Data 2 Data 2 • Minimizes worst-case fragmentation Data 3 Data 3 Failed • Internal fragmentation in each chunk … User will re-try to store Data 3 Data 1 Data 1 Data 1 Empty Data 2 Data 2 Data 2 Data 3 Data 3 Data 3 Data 3 Data 3 Data 3 CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University GFS only guarantees that the data will be written at What if record append fails at one of the replicas least once as an atomic unit • Client must retry the operation • For an operation to return success • Replicas of same chunk may contain • Data must be written at the same offset on all the replicas • Different data • Duplicates of the same record • After the write, all replicas are as long as the end of the record • In whole or in part • Any future record will be assigned a higher offset or a different chunk • Replicas of chunks are not bit-wise identical ! • In most systems, replicas are identical http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 4
CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Storage Software: Colossus (GFS2) • Next-generation cluster-level file system • Automatically sharded metadata layer • Distributed Masters (64MB block size à 1MB) GEAR Session 1. peta-scale storage systems • Data typically written using Reed-Solomon (1.5x) Lecture 2. Google File System and Hadoop Distributed File System • Client-driven replication, encoding and replication Google File System II Colossus • Metadata space has enabled availability • Why Reed-Solomon? • Cost • Especially with cross cluster replication • More flexible cost vs. availability choices • Google File System II: Dawn of the Multiplying Master Nodes, http://www.theregister.co.uk/2009/08/12/google_file_system_part_deux/?page=1 CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University Reed-Solomon Codes • Block-based error correcting codes • Digital communication and storage • Storage devices (including tape, CD, DVD, barcodes, etc) GEAR Session 1. peta-scale storage systems • Wireless or mobile communications Lecture 1. Google File System and Hadoop Distributed File System Google File System II Colossus • Satellite communications Reed-Solomon Codes • Digital TV • High-speed modems SOURCE: https://en.wikiversity.org/wiki/Reed–Solomon_codes_for_coders CS535 Big Data | Computer Science | Colorado State University CS535 Big Data | Computer Science | Colorado State University What does the R-S code do? A Quick Example of the R-S encoding • Takes a block of digital data • 4+2 coding • Original files are broken into 4 pieces • Adds extra “redundant” bits • 2 parity pieces are added • First piece of data “ABCD”, second piece of data “EFGH”… • If an error happens, the R-S decoder processes each block and recovers original data Original Data A B C D Noise, Errors E F G H I J K L Communication Data Reed-Solomon Reed-Solomon M N O P Data channel or Sink Encoder Decoder source storage devices http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 5
Recommend
More recommend