algorithms and methods for distributed storage networks
play

Algorithms and Methods for Distributed Storage Networks 8 Storage - PowerPoint PPT Presentation

Algorithms and Methods for Distributed Storage Networks 8 Storage Virtualization and DHT Christian Schindelhauer Albert-Ludwigs-Universitt Freiburg Institut fr Informatik Rechnernetze und Telematik Wintersemester 2007/08 Overview


  1. Algorithms and Methods for Distributed Storage Networks 8 Storage Virtualization and DHT Christian Schindelhauer Albert-Ludwigs-Universität Freiburg Institut für Informatik Rechnernetze und Telematik Wintersemester 2007/08

  2. Overview ‣ Concept of Virtualization ‣ Storage Area Networks • Principles • Optimization ‣ Distributed File Systems • Without virtualization, e.g. Network File Systems • With virtualization, e.g. Google File System ‣ Distributed Wide Area Storage Networks • Distributed Hash Tables • Peer-to-Peer Storage Rechnernetze und Telematik Algorithms Theory 2 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  3. Concept of Virtualization File ‣ Principle • A virtual storage constitutes handles all application accesses to the file system • The virtual disk partitions files and stores blocks over several (physical) Virtual Disk hard disks • Control mechanisms allow redundancy and failure repair ‣ Control • Virtualization server assigns data, e.g. blocks of files to hard disks (address space remapping) • Controls replication and redundancy strategy • Adds and removes storage devices Hard Disks Rechnernetze und Telematik Algorithms Theory 3 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  4. Storage Virtualization ‣ Capabilities ‣ Classic Implementation • Replication • Host-based • Pooling - Logical Volume Management • Disk Management - File Systems, e.g. NFS ‣ Advantages • Storage devices based • Data migration - RAID • Higher availability • Network based • Simple maintenance - Storage Area Network • Scalability ‣ New approaches • Distributed Wide Area Storage ‣ Disadvantages Networks • Un-installing is time consuming • Distributed Hash Tables • Compatibility and interoperability • Peer-to-Peer Storage • Complexity of the system Rechnernetze und Telematik Algorithms Theory 4 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  5. Storage Area Networks ‣ Virtual Block Devices • without file system • connects hard disks ‣ Advantages • simpler storage administration • more flexible • servers can boot from the SAN • effective disaster recovery • allows storage replication ‣ Compatibility problems • between hard disks and virtualization server Rechnernetze und Telematik Algorithms Theory 5 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  6. SAN Networking ‣ Networking • FCP (Fibre Channel Protocol) - SCSI over Fibre Channel • iSCSI (SCSI over TCP/IP) • HyperSCSI (SCSI over Ethernet) • ATA over Ethernet • Fibre Channel over Ethernet • iSCSI over InfiniBand • FCP over IP http://en.wikipedia.org/wiki/Storage_area_network Rechnernetze und Telematik Algorithms Theory 6 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  7. SAN File Systems ‣ File system for concurrent read and write operations by multiple computers • without conventional file locking • concurrent direct access to blocks by servers ‣ Examples • Veritas Cluster File System • Xsan • Global File System • Oracle Cluster File System • VMware VMFS • IBM General Parallel File System Rechnernetze und Telematik Algorithms Theory 7 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  8. Distributed File Systems (without Virtualization) ‣ aka. Network File System ‣ Supports sharing of files, tapes, printers etc. ‣ Allows multiple client processes on multiple hosts to read and write the same files • concurrency control or locking mechanisms necessary ‣ Examples • Network File System (NFS) • Server Message Block (SMB), Samba • Apple Filing Protocol (AFP) • Amazon Simple Storage Service (S3) Rechnernetze und Telematik Algorithms Theory 8 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  9. Distributed File Systems with Virtualization ‣ Example: Google File System Application GFS master /foo/bar (file name, chunk index) chunk 2ef0 GFS client File namespace ‣ File system on top of other file (chunk handle, chunk locations) Legend: systems with builtin virtualization Data messages Control messages Instructions to chunkserver • System built from cheap standard Chunkserver state (chunk handle, byte range) GFS chunkserver GFS chunkserver components (with high failure rates) chunk data Linux file system Linux file system • Few large files • Only operations: read, create, append, 4 step 1 Client Master delete 2 3 - concurrent appends and reads Secondary Replica A must be handled 6 • High bandwidth important 7 Primary 5 Replica ‣ Replication strategy Legend: Control • chunk replication 6 Secondary Data Replica B • master replication Figure 2: Write Control and Data Flow The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Rechnernetze und Telematik Algorithms Theory 9 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  10. Distributed Wide Area Storage Networks ‣ Distributed Hash Tables • Relieving hot spots in the Internet • Caching strategies for web servers ‣ Peer-to-Peer Networks • Distributed file lookup and download in Overlay networks • Most (or the best) of them use: DHT Rechnernetze und Telematik Algorithms Theory 10 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  11. WWW Load Balancing www.apple.de www.uni-freiburg.de ‣ Web surfing: www.google.com • Web servers offer web pages • Web clients request web pages ‣ Most of the time these requests are independent ‣ Requests use resources of the web servers • bandwidth • computation time Arne Christian Stefan Rechnernetze und Telematik Algorithms Theory 11 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  12. Load www.google.com ‣ Some web servers have always high load • for permanent high loads servers must be sufficiently powerful ‣ Some suffer under high fluctuations • e.g. special events: - jpl.nasa.gov (Mars mission) Monday Tuesday Wednesday - cnn.com (terrorist attack) • Server extension for worst case not reasonable • Serving the requests is desired Rechnernetze und Telematik Algorithms Theory 12 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  13. Load Balancing in the WWW Monday Tuesday Wednesday ‣ Fluctuations target B A B B A A some servers ‣ (Commercial) solution • Service providers offer exchange servers an • Many requests will be B A distributed among these servers ‣ But how? Rechnernetze und Telematik Algorithms Theory 13 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  14. Literature ‣ Leighton, Lewin, et al. STOC 97 • Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web ‣ Used by Akamai (founded 1997) Web-Cache Rechnernetze und Telematik Algorithms Theory 14 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  15. Start Situation ‣ Without load balancing ‣ Advantage • simple Web-Server ‣ Disadvantage Web pages • servers must be designed for worst case situations request Web-Clients Rechnernetze und Telematik Algorithms Theory 15 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  16. Site Caching Web-Server ‣ The whole web-site is copied to different web caches redirect ‣ Browsers request at web server Web-Cache ‣ Web server redirects requests to Web- Cache ‣ Web-Cache delivers Web pages ‣ Advantage: • good load balancing ‣ Disadvantage: • bottleneck: redirect • large overhead for complete web-site replication Web-Clients Rechnernetze und Telematik Algorithms Theory 16 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  17. Proxy Caching Web-Server ‣ Each web page is distributed to a few web-caches redirect ‣ Only first request is sent to web server Link ‣ Links reference to pages in the web- cache ‣ Then, web clients surfs in the web- cache request Web- ‣ Advantage: Cache • No bottleneck 1. ‣ Disadvantages: 2. 4. 3. • Load balancing only implicit • High requirements for placements Web-Client Rechnernetze und Telematik Algorithms Theory 17 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  18. Requirements Balance Dynamics Efficient insert and delete of web- fair balancing of web pages cache-servers and files ? ? X X new Views Web-Clients „see“ different set of web-caches Rechnernetze und Telematik Algorithms Theory 18 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  19. Hash Functions Buckets Items Set of Items: Set of Buckets: Example: Rechnernetze und Telematik Algorithms Theory 19 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

  20. Ranged Hash-Funktionen ‣ Given: • Items , Number • Caches (Buckets), Bucket set: • Views ‣ Ranged Hash-Funktion: • • Prerequisite: for alle views Buckets View Items Rechnernetze und Telematik Algorithms Theory 20 Albert-Ludwigs-Universität Freiburg Winter 2008/09 Christian Schindelhauer

Recommend


More recommend