CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet- wide Services Flat Datace cent nter er St Storag age Presenter: Saroj Panda
Agenda • FDS ARCHITECTURE • DATA PLACEMENT • ANSWER TO QUESTIONS • HANDLING FAILURE • Q & A • REFERENCES
FDS Architecture Data is logically stored in blobs. A blob is a byte sequence named with a 128-bit GUID. Read and writes in units called tracts. Each disk Managed by a Tract Server.
FDS Architecture Cont… Read/Write does not go through Metadata Server. Metadata Server keeps track of list of active tract servers in a table called TLT. Client application contacts the metadata server when it starts to get the TLT.
Q1. “Consider a centralized file server in a small computer science department. Data stored by y an any y computer can an be retrie ieved by y an any y oth other. This is con onceptual si simpli licity makes it it eas asy to o use se: : computation can an hap appen on on an any computer, even in in par arall llel, l, with ithout regard to o first putting data in the right place.” Is GFS a centralized file system? To achieve good performance for an an I/ I/O-in intensiv ive program ru runnin ing on on GF GFS, how doe oes th the data plac lacement affect its its perf rformance? Answer: GFS can be viewed as a centralized file system as each read/write request first goes through the Master (Metadata Server). Practically, it is a distributed File System. In GFS the data is distributed as chucks across chunk servers. Processing of different chunks can be done in parallel at different chunk servers. This distribution of load improves performance of the system.
Q2. “The root of this cascade of consequences was the locality constraint, itself rooted in the datacenter bandwidth shortage.” Show example consequences of relying on locality in program’s execution. Why does a sufficient I/O bandwidth help remove the constraint? Answer: Locality constraints (Computation at where data is) can sometimes affect efficient resource utilization. Example(stragglers): If data is singly replicated, a single unexpectedly slow machine can hinder an entire job’s timely completion , where the faster machines after completing their job have to wait for the straggler machine to complete its part. With locality constraint, re-tasking computation to other nodes would be expensive involving huge data movements. If we consider sufficient I/O Bandwidth then we could expose all of a cluster’s disk bandwidth to applications. There would be no distinction between local and remote disks. So constraints like re- tasking straggler machine job to another machine doesn’t require expensive data movement.
DATA PLACEMENT FDS metadata server collects list of active tract servers called Tract Locator Table. In a single-replicated system, each TLT entry contains the address of a single tract server. With k-way replication, each entry has k tract servers.
DATA PLACEMENT To read or write tract number i from a blob with GUID g, a client first selects an entry in the TLT by computing an index into it called the tract locator. The client sends the write to every tract server it contains. Applications are notified after write acknowledgments from all replicas. Read from one tract server.
HANDLING FAIL ILURE Tract servers periodically send heartbeat messages to the metadata server. On Timeout Metadata Server assumes the Tract Server Dead. Metadata server invalidates the current TLT by incrementing the version number of each row in which the failed tract server appears. Randomly picks up replica and fills up those places in TLT. Updates affected servers. Hands out new TLTs to clients when queried.
Q3. “In FDS, data is logically stored in blobs. ... Reads from and writes to a blob are done in units called tracts.” What are blob and tracts? Are they of constant sizes? What are th their respectiv ive equiv ivale lents in in GF GFS? Answer: Blobs are Byte sequences identified by a 128-bit Global Unique Identifiers (GUID). The GUID can either be selected by the application or assigned randomly by the system. Tracts are Unit of Data Read/Written to Blobs. Blob can be any length up to system’s storage capacity. Tracts are sized so that random and sequential accesses have same throughput. Tract size is fixed for the storage technology used. Blob is equivalent to File and Tract equivalent to chunk in GFS.
Q4. “In our cluster, tracts are 8MB”. Why is a tract in FDS sized this large? Answer: Tracts are sized such that random and sequential access achieves nearly the same throughput. The tract size is set when the cluster is created based upon cluster storage hardware. For example, if flash were used instead of disks, the tract size could be made far smaller (e.g., 64kB). If the tract size is small, the client need to write number of times for a large blob. The process would be slow. If the tract size is large, the number of writes will reduce. In the several experiments performed by the authors 8MB was the ideal size to get the same throughput for random and sequential access.
Q5. “Tract servers do not use a file system.” Explain this design choice. Answer: FDS is storage system and not a file system. The tract servers directly handle raw disks. Blobs are divide into tracts and numbered sequentially. The client application communicate with the FDS through an API that abstracts some of the complexity around the messaging layer. Tract servers and their network protocol are not exposed directly to FDS applications. Instead, these details are hidden in a client library with a narrow and straightforward interface. This design choice gets rid of file system overhead of maintaining the hierarchical file system structure and maintaining those in memory for efficiency.
Q6. “FDS uses a metadata server, but its role during normal operations is simple and limited:…” What are drawbacks of using a centralized metadata server? How does FDS ad address th the is issu sue? Answer: Centralized metadata server becomes central point of failure. Tract servers locally store their position in the Tract Locator Table, TLT. Metadata server collects this list from all the active tract servers. When the client application starts, it contacts the metadata server to get the TLT. TLT cached in Client. In case of a metadata server failure, the TLT is reconstructed by collecting the table assignments from each tract server, without loosing any information.
Q7. 7. How doe oes FD FDS loc locate th the tr tract se server th that stores a a par articular tr tract of of a a giv iven blob lob? Why doe oes FD FDS fir first id identify fy a a tr tract loc locator (an (an in index to o an an entry ry of of tr tract loc locator tab able le) an and th then in in th the entry try to o fin find th the tr tract se server, rath ther th than dir irectly id identify fying a a tr tract se server usi sing a a has ash fu function with ithout havin ing su such ch a a tab able? Answer: Tract number i from a blob with GUID g is identified from the TLT by computing an index into it called the tract locator by applying a hash function. Tract Locator row found contains all the tract server replicas for that tract. Directly identifying a tract server using a hash function will cause problem when that tract server goes down. The hash function will always give out that dead tract server and the client application can not retrieve the tract data. With tract locator approach, the tracts of the dead tract server are copied to other active tract servers from the other replicas and the dead tract server names are replaced by new tract server names in the TLT. This allows the client application to retrieve the client data from any of the replicas even in case of tract server failure.
Recommend
More recommend