DAAD Summerschool Curitiba 2011 Aspects of Large Scale High Speed - PowerPoint PPT Presentation

DAAD Summerschool Curitiba 2011 Aspects of Large Scale High Speed Computing Building Blocks of a Cloud Storage Networks 3: Distributed Hash Tables - Virtualization without Index Database Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg

Concept of Virtualization File ‣ Principle • A virtual storage constitutes handles all application accesses to the file system • The virtual disk partitions files and stores blocks over several (physical) Virtual Disk hard disks • Control mechanisms allow redundancy and failure repair ‣ Control • Virtualization server assigns data, e.g. blocks of files to hard disks (address space remapping) • Controls replication and redundancy strategy • Adds and removes storage devices Hard Disks 2

Distributed Wide Area Storage Networks  Distributed Hash Tables - Relieving hot spots in the Internet - Caching strategies for web servers  Peer-to-Peer Networks - Distributed file lookup and download in Overlay networks - Most (or the best) of them use: DHT 3

WWW Load Balancing  Web surfing: www.apple.de www.uni-freiburg.de www.google.com - Web servers offer web pages - Web clients request web pages  Most of the time these requests are independent  Requests use resources of the web servers - bandwidth - computation time Arne Christian Stefan 4

Load www.google.com ‣ Some web servers have always high load • for permanent high loads servers must be sufficiently powerful ‣ Some suffer under high fluctuations • e.g. special events: - jpl.nasa.gov (Mars mission) Monday Tuesday Wednesday - cnn.com (terrorist attack) • Server extension for worst case not reasonable • Serving the requests is desired 5

Load Balancing in the WWW Monday Tuesday Wednesday  Fluctuations target some B B A A B A servers  (Commercial) solution - Service providers offer exchange servers an - Many requests will be distributed among these B A servers  But how? 6

Literature ‣ Leighton, Lewin, et al. STOC 97 • Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web ‣ Used by Akamai (founded 1997) Web-Cache 7

Start Situation ‣ Without load balancing ‣ Advantage • simple Web-Server ‣ Disadvantage Web pages • servers must be designed for worst case situations request Web-Clients 8

Site Caching Web-Server ‣ The whole web-site is copied to different web caches t c e r i d e r ‣ Browsers request at web server Web-Cache ‣ Web server redirects requests to Web- Cache ‣ Web-Cache delivers Web pages ‣ Advantage: • good load balancing ‣ Disadvantage: • bottleneck: redirect • large overhead for complete web-site replication Web-Clients 9

Proxy Caching Web-Server ‣ Each web page is distributed to a few web-caches t c e r i d e r ‣ Only first request is sent to web server Link ‣ Links reference to pages in the web- cache ‣ Then, web clients surfs in the web- cache request Web- ‣ Advantage: Cache • No bottleneck 1. ‣ Disadvantages: 2. 4. 3. • Load balancing only implicit • High requirements for placements Web-Client 10

Requirements Balance Dynamics Efficient insert and delete of web- fair balancing of web pages cache-servers and files ? ? X X new Views Web-Clients „see“ different set of web-caches 11

Hash Functions Buckets Items Set of Items: Set of Buckets: Example: 12

Ranged Hash-Funktionen  Given: - Items , Number - Caches (Buckets), Bucket set: - Views  Ranged Hash-Funktion: - - Prerequisite: for alle views Buckets View Items 13

First Idea: Hash Function 3 i + 1 mod 4  Algorithm: 2 5 - Choose Hash funktion, e.g. 9 4 3 6 n: number of Cache servers 0 1 2 3  Balance: - very good 2 i + 2 mod 3  Dynamics 2 5 - Insert or remove of a single cache 9 4 3 6 server X - New hash functions and total re- hashing 0 1 2 3 - Very expensive!! 14

Requirements of the Ranged Hash Functions  Monotony - After adding or removing new caches (buckets) no pages (items) should be moved  Balance - All caches should have the same load  Spread - A page should be distributed to a bounded number of caches  Load - No Cache should not have substantially more load than the average 15

Monotony • After adding or removing new caches (buckets) no pages (items) should be moved • Formally: For all Pages Caches View 1: View 2: Caches Pages 16

Balance • For every view V the is the f V (i) balanced For a constant c and all : Pages Caches View 1: View 2: Caches Pages 17

Spread • The spread σ (i) of a page i is the overall number of all necessary copies (over all views) View 1: View 2: View 3: 18

Load • The load λ (b) of a cache b is the over-all number of all copies (over all views) wher := set of all pages assigned to bucket b � � � � � in View V View 1: λ (b 1 ) = 2 λ (b 2 ) = 3 View 2: View 3: b 1 b 2 19

Distributed Hash Tables number of caches (Buckets) C � C/t � minimum number of caches per View Theorem V/C = constant (#Views / #Caches) I = C � (# pages = # Caches) There exists a family of hash function with the following properties  Each function f ∈ F is monotone  � Balance : For every view  � Spread : For each page i with probability  � Load: For each cache b with probability 20

The Design  2 Hash functions onto the reals [0,1] maps k log C copies of cache b randomly to [0,1] maps web page i randomly to the interval [0,1]  := Cache , which minimizes Caches (Buckets): View 1 0 1 View 2 0 1 Web pages (Items): 21

Monotony  := Cache which minimizes For all : Observe: blue interval in V 2 and in V 1 empty! View 1 0 1 View 2 0 1 22

2. Balance Balance : For all views – Choose fixed view and a web page i – Apply hash functions and . – Under the assumption that the mapping is random • every cache is chosen with the same probability Caches (Buckets): View 0 1 Webseiten (Items): 23

3. Spread σ (i) = number of all necessary copies (over all views ) number of caches (Buckets) C � C/t � minimum number of caches per View ever user knows at least a fraction of 1/t V/C = constant (#Views / #Caches) over the caches I = C � (# pages = # Caches) For every page i with prob. Proof sketch: • Every view has a cache in an interval of length t/C (with high probability) • The number of caches gives an upper bound for the spread 0 t/C 2t/C 1 24

4. Load • Last (load): λ (b) = Number of copies over all views where := set of pages assigned to bucket b under view V • For every cache be we observe � � � � � with probability Proof sketch: Consider intervals of length t/C • With high probability a cache of every view falls into one of these intervals • The number of items in the interval gives an upper bound for the load 0 t/C 2t/C 1 25

Summary  Distributed Hash Table - is a distributed data structure for virtualization - with fair balance - provides dynamic behavior  Standard data structure for dynamic distributed storages 26

DAAD Summerschool Curitiba 2011 Aspects of Large Scale High Speed Computing Building Blocks of a Cloud Storage Networks 3: Distributed Hash Tables - Virtualization without Index Database Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg

DAAD Summerschool Curitiba 2011 Aspects of Large Scale High Speed - PowerPoint PPT Presentation

DAAD Summerschool Curitiba 2011 Aspects of Large Scale High Speed Computing Building Blocks of a Cloud Storage Networks 3: Distributed Hash Tables - Virtualization without Index Database Christian Schindelhauer Technical Faculty

DAAD Summerschool Curitiba 2011 Aspects of Large Scale High Speed Computing Building Blocks of a

stem innovati tion eco cosy syste DAAD DAAD Worksh shop p Coope perati tion at t Aca

What is DAAD? German national agency for international academic cooperation and exchange

DAAD Scholarships & Grants for Graduate Students Dr. Adrienne P. Stephenson, Director Dr.

Coq LASER 2011 Summerschool Elba Island, Italy Christine Paulin-Mohring Universit Paris Sud

Graphs and Conditional Independence Steffen Lauritzen, University of Oxford CIMPA Summerschool,

Ciclo Med do Brazil Ltda. was established in 2001 in the city of Curitiba, Paran. So Paulo is

THE RELATIONSHIP BETWEEN MIGRATION AND COMMUTING AT CURITIBA METROPOLITAN REGION (CMR) AND ITS

Italian stoneworkers in America stonemasonry in Curitiba, Paran (Brazil) Antonio Liccardo

Sage for Cryptographers Martin R. Albrecht (martinralbrecht+summerschool@googlemail.com) POLSYS

Summerschool in Aveiro (Sept. 2018), Ernst Hairer Part I. Geometric numerical integration

International Programs at the Environmental Campus Birkenfeld 7 May 2018 DAAD Webinar USA 1 The

through Higher Education The case of Syria and beyond Dr. Nina Lemmens, DAAD New York Dr. Katja

Herzlich Willkommen! German Study-Abroad Information Session 2019-20 WHO AM I? DAAD Y

Introduction to Dialectometry III Wilbert Heeringa German Academic Exchange Service DAAD

Introduction to Dialectometry II Wilbert Heeringa German Academic Exchange Service DAAD

Accelerating YouTube & Google Search Andreas Terzis YouTube Statistics YouTube is a large

Helping Hand or Hidden Hurdle: Proxy-assisted HTTP-based Adaptive Streaming Performance

A Secure, Publisher-Centric Web Caching Infrastructure Andy Myers, John Chuang, Urs Hengartner,

Deploying Large Scale Webapps Thierry Sans Users respond to speed Amazon found every 100ms of

Distributed Systems Principles and Paradigms Chapter 12 (version October 15, 2007 ) Maarten van

Time Sensitive Application Data loss Bandwidth

Wikipedias CDN Research, Engineering, Free Software Emanuele Rocca Wikimedia Foundation

Error Control for Real-Time Audio-Visual Services Georg Carle Institut EURECOM Sophia-Antipolis

DAAD Summerschool Curitiba 2011 Aspects of Large Scale High Speed - PowerPoint PPT Presentation

DAAD Summerschool Curitiba 2011 Aspects of Large Scale High Speed Computing Building Blocks of a Cloud Storage Networks 3: Distributed Hash Tables - Virtualization without Index Database Christian Schindelhauer Technical Faculty

DAAD Summerschool Curitiba 2011 Aspects of Large Scale High Speed Computing Building Blocks of a

stem innovati tion eco cosy syste DAAD DAAD Worksh shop p Coope perati tion at t Aca

What is DAAD? German national agency for international academic cooperation and exchange

DAAD Scholarships &amp; Grants for Graduate Students Dr. Adrienne P. Stephenson, Director Dr.

Coq LASER 2011 Summerschool Elba Island, Italy Christine Paulin-Mohring Universit Paris Sud

Graphs and Conditional Independence Steffen Lauritzen, University of Oxford CIMPA Summerschool,

Ciclo Med do Brazil Ltda. was established in 2001 in the city of Curitiba, Paran. So Paulo is

THE RELATIONSHIP BETWEEN MIGRATION AND COMMUTING AT CURITIBA METROPOLITAN REGION (CMR) AND ITS

Italian stoneworkers in America stonemasonry in Curitiba, Paran (Brazil) Antonio Liccardo

Sage for Cryptographers Martin R. Albrecht (martinralbrecht+summerschool@googlemail.com) POLSYS

Summerschool in Aveiro (Sept. 2018), Ernst Hairer Part I. Geometric numerical integration

International Programs at the Environmental Campus Birkenfeld 7 May 2018 DAAD Webinar USA 1 The

through Higher Education The case of Syria and beyond Dr. Nina Lemmens, DAAD New York Dr. Katja

Herzlich Willkommen! German Study-Abroad Information Session 2019-20 WHO AM I? DAAD Y

Introduction to Dialectometry III Wilbert Heeringa German Academic Exchange Service DAAD

Introduction to Dialectometry II Wilbert Heeringa German Academic Exchange Service DAAD

Accelerating YouTube &amp; Google Search Andreas Terzis YouTube Statistics YouTube is a large

Helping Hand or Hidden Hurdle: Proxy-assisted HTTP-based Adaptive Streaming Performance

A Secure, Publisher-Centric Web Caching Infrastructure Andy Myers, John Chuang, Urs Hengartner,

Deploying Large Scale Webapps Thierry Sans Users respond to speed Amazon found every 100ms of

Distributed Systems Principles and Paradigms Chapter 12 (version October 15, 2007 ) Maarten van

Time Sensitive Application Data loss Bandwidth

Wikipedias CDN Research, Engineering, Free Software Emanuele Rocca Wikimedia Foundation

Error Control for Real-Time Audio-Visual Services Georg Carle Institut EURECOM Sophia-Antipolis

DAAD Scholarships & Grants for Graduate Students Dr. Adrienne P. Stephenson, Director Dr.

Accelerating YouTube & Google Search Andreas Terzis YouTube Statistics YouTube is a large