Internet Server Clusters Internet Server Clusters Jeff Chase Duke University, Department of Computer Science CPS 212: Distributed Information Systems
Using Clusters for Scalable Services Using Clusters for Scalable Services Clusters are a common vehicle for improving scalability and availability at a single service site in the network. Are network services the “Killer App” for clusters? • incremental scalability just wheel in another box... • excellent price/performance high-end PCs are commodities: high-volume, low margins • fault-tolerance “simply a matter of software” • high-speed cluster interconnects are on the market SANs + Gigabit Ethernet... cluster nodes can coordinate to serve requests w/ low latency • “shared nothing”
[Fox/Brewer]: SNS, TACC, and All That [Fox/Brewer]: SNS, TACC, and All That [Fox/Brewer97] proposes a cluster-based reusable software infrastructure for scalable network services (“SNS”), such as: • TranSend : scalable, active proxy middleware for the Web think of it as a dial-up ISP in a box, in use at Berkeley distills/transforms pages based on user request profiles • Inktomi/ HotBot search engine core technology for Inktomi Inc., today with $15B market cap. “bringing parallel computing technology to the Internet” Potential services are based on T ransformation, A ggregation, C aching, and C ustomization (TACC), built above SNS.
TACC TACC Vision : deliver “the content you want” by viewing HTML content as a dynamic, mutable medium. 1. Transform Internet content according to: • network and client needs/limitations e.g., on-the-fly compression/distillation [ASPLOS96], packaging Web pages for PalmPilots, encryption, etc. • directed by user profile database 2. Aggregate content from different back-end services or resources. 3. Cache content to reduce cost/latency of delivery. 4. Customize (see Transform )
TranSend Structure Structure TranSend $ html gif jpg $ $ Front Control To Internet Profiles Ends Panel SAN (high speed) $ Cache partition Utility (10baseT) ... Datatype-specific distiller Coordination bus [adapted from Armando Fox (through http://ninja.cs.berkeley.edu/pubs )]
SNS/TACC Philosophy SNS/TACC Philosophy 1. Specify services by plugging generic programs into the TACC framework, and compose them as needed. sort of like CGI with pipes run by long-lived worker processes that serve request queues allows multiple languages, etc. 2. Worker processes in the TACC framework are loosely coordinated, independent, and stateless. ACID vs. BASE serve independent requests from multiple users narrow view of a “service”: one-shot readonly requests, and stale data is OK 3. Handle bursts with designated overflow pool of machines.
TACC Examples TACC Examples HotBot search engine A $ DB A $ DB • Query crawler’s DB • Cache recent searches T T C • Customize UI/presentation html html TranSend transformation proxy • On-the-fly lossy compression of inline images (GIF, JPG, etc.) • Cache original & transformed $ $ • User specifies aggressiveness, “refinement” UI, etc. C T T [Fox]
(Worker) Ignorance Is Bliss (Worker) Ignorance Is Bliss What workers don’t need to know • Data sources/sinks • User customization (key/value pairs) • Access to cache • Communication with other workers by name Common case: stateless workers C, Perl, Java supported • Recompilation often unnecessary • Useful tasks possible in <10 lines of (buggy) Perl [Fox]
Questions Questions 1. What are the research contributions of the paper? system architecture decouples SNS concerns from content TACC programming model composes stateless worker modules validation using two real services, with measurements How is this different from clusters for parallel computing? 2. How is this different from clusters for parallel computing? 3. What are the barriers to scale in SNS/TACC? 4. How are requests distributed to caches, FEs, workers? 5. What can we learn from the quantitative results? 6. What about services that allow client requests to update shared data? e.g., message boards, calendars, mail,
SNS/TACC Functional Issues SNS/TACC Functional Issues 1. What about fault-tolerance? • Service restrictions allow simple, low-cost mechanisms. Primary/backup process replication is not necessary with BASE model and stateless workers. • Uses a process-peer approach to restart failed processes. Processes monitor each other’s health and restart if necessary. Workers and manager find each other with “beacons” on well- known ports. 2. Load balancing? • Manager gathers load info and distributes to front-ends. • How are incoming requests distributed to front-ends?
[Saito] Porcupine: A Highly Available Cluster- - Porcupine: A Highly Available Cluster based Mail Service based Mail Service Yasushi Saito Brian Bershad Hank Levy http://porcupine.cs.washington.edu/ University of Washington Department of Computer Science and Engineering, Seattle, WA
[Saito] Why Email? Why Email? Mail is important Real demand Mail is hard How much of Porcupine is Write intensive reusable to other services? Low locality Mail is easy Can we use the SNS/TACC Well-defined API framework for this? Large parallelism Weak consistency
[Saito] Goals Goals Use commodity hardware to build a large, scalable mail service Three facets of scalability ... Performance : Linear increase with cluster size Manageability : React to changes automatically Availability : Survive failures gracefully
[Saito] Conventional Mail Solution Conventional Mail Solution SMTP/IMAP/POP Static partitioning Performance problems: No dynamic load balancing Manageability problems : Manual data partition decision Ann’s Bob’s Joe’s Suzy’s mbox mbox mbox mbox Availability problems: Limited fault tolerance NFS servers
[Saito] Key Techniques and Relationships Key Techniques and Relationships Functional Homogeneity Framework “ any node can perform any task ” Automatic Load Techniques Replication Reconfiguration Balancing Goals Availability Manageability Performance
[Saito] Porcupine Architecture Porcupine Architecture SMTP POP IMAP server server server Load Balancer User map Membership RPC Manager Replication Manager Mail map User Mailbox storage profile ... ... Node A Node B Node Z
[Saito] Porcupine Operations Porcupine Operations Protocol User Load Message handling lookup Balancing store ÿþýüûþüý C A � �� �� � �� � üþ� � � � �� � � �ý� � � � �� � � � þ � �� � � � � �� üû� � � � � � � � �� þ�� � �� � ý� ûü � ü� ü� ý� þ � � � � � � � � ... ... A B C B ��ý� ü � ü� ý � �� � � þ� ü�ý�� ý� ûü � þ� ü� þü�� ÿ � � � � � ÿ �
[Saito] Basic Data Structures Basic Data Structures “bob” Apply hash function User map BCACABAC BCACABAC BCACABAC Mail map bob : {A,C} suzy : {A,C} joe : {B} /user info ann : {B} fragment list Suzy’s Ann’s Suzy’s Bob’s Joe’s Bob’s Mailbox MSGs MSGs MSGs MSGs MSGs MSGs storage mailbox fragments A B C
[Saito] Porcupine Advantages Porcupine Advantages Advantages: Optimal resource utilization Automatic reconfiguration and task re-distribution upon node failure/recovery Fine-grain load balancing Results: Better Availability Better Manageability Better Performance
[Saito] Availability Availability Goals: Maintain function after failures React quickly to changes regardless of cluster size Graceful performance degradation / improvement Strategy: Two complementary mechanisms Hard state: email messages, user profile ÿ Optimistic fine-grain replication Soft state : user map, mail map ÿ Reconstruction after membership change
[Saito] Soft- -state Reconstruction state Reconstruction Soft 2. Distributed 1. Membership protocol disk scan Usermap recomputation suzy B C A B A B A C B A A B A B A B A C A C A C A C ann bob : {A,C} bob : {A,C} bob : {A,C} suzy : suzy : {A,B} A B C A B A B A C B A A B A B A B A C A C A C A C joe : {C} joe : {C} joe : {C} ann : ann : {B} B B C A B A B A C B C A B A B A C B C A B A B A C suzy : {A,B} suzy : {A,B} suzy : {A,B} ann : {B} ann : {B} ann : {B} C Timeline
[Saito] How does Porcupine React to How does Porcupine React to Configuration Changes? Configuration Changes? 700 No failure 600 One node failure Messages Three node 500 /second failures Six node 400 failures 300 Time(seconds) 0 100 200 300 400 500 600 700 800 New Nodes Nodes New membership recover fail membership determined determined
[Saito] Hard- -state Replication state Replication Hard Goals: Keep serving hard state after failures Handle unusual failure modes Strategy: Exploit Internet semantics Optimistic, eventually consistent replication Per-message, per-user-profile replication Efficient during normal operation Small window of inconsistency How will Porcupine behave in a partition failure?
Recommend
More recommend