A Scalable, Cont ent - Addressable Net work 1,2 3 1 Sylvia Rat nasamy, Paul Francis, Mark Handley, 1,2 1 Richard Karp, Scot t Shenker 2 3 1 Tahoe U.C.Ber keley ACI RI Net works
Out line • I nt roduct ion • Design • Evaluat ion • St r engt hs & Weaknesses • Ongoing Work
I nt ernet -scale hash t ables • Hash t ables – essent ial building block in sof t ware syst ems • I nt ernet -scale dist ribut ed hash t ables – equally valuable t o large-scale dist ribut ed syst ems?
I nt ernet -scale hash t ables • Hash t ables – essent ial building block in sof t ware syst ems • I nt ernet -scale dist ribut ed hash t ables – equally valuable t o large-scale dist ribut ed syst ems? • peer -t o-peer syst ems – Napst er, Gnut ella, Groove, FreeNet , Moj oNat ion… • large-scale st orage management syst ems – Publius, OceanSt ore, PAST, Farsit e, CFS ... • mirroring on t he Web
Cont ent -Addressable Net work (CAN) • CAN: I nt ernet -scale hash t able • I nt erf ace – insert (key,value) – value = ret rieve(key)
Cont ent -Addressable Net work (CAN) • CAN: I nt ernet -scale hash t able • I nt erf ace – insert (key,value) – value = ret rieve(key) • Propert ies – scalable – operat ionally simple – good perf ormance (w/ improvement )
Cont ent -Addressable Net work (CAN) • CAN: I nt ernet -scale hash t able • I nt erf ace – insert (key,value) – value = ret rieve(key) • Propert ies – scalable – operat ionally simple – good perf ormance • Relat ed syst ems: Chord/ P ast ry/ Tapest ry/ Buzz/ Plaxt on ...
Problem Scope � Design a syst em t hat provides t he int erf ace � scalabilit y � robust ness � perf ormance � securit y � Applicat ion-specif ic, higher level primit ives � keyword searching � mut able cont ent � anonymit y
Out line • I nt roduct ion • Design • Evaluat ion • St r engt hs & Weaknesses • Ongoing Work
CAN: basic idea K V K V K V K V K V K V K V K V K V K V K V
CAN: basic idea K V K V K V K V K V K V K V K V K V K V K V insert (K 1 ,V 1 )
CAN: basic idea K V K V K V K V K V K V K V K V K V K V K V insert (K 1 ,V 1 )
CAN: basic idea (K 1 ,V 1 ) K V K V K V K V K V K V K V K V K V K V K V
CAN: basic idea K V K V K V K V K V K V K V K V K V K V K V ret rieve (K 1 )
CAN: solut ion • virt ual Cart esian coordinat e space • ent ire space is part it ioned amongst all t he nodes – every node “ owns” a zone in t he overall space • abst ract ion – can st ore dat a at “ point s” in t he space – can rout e f rom one “ point ” t o anot her • point = node t hat owns t he enclosing zone
CAN: simple example 1
CAN: simple example 1 2
CAN: simple example 3 1 2
CAN: simple example 3 1 4 2
CAN: simple example
CAN: simple example I
CAN: simple example node I ::insert (K,V) I
CAN: simple example node I ::insert (K,V) I (1) a = h x (K) x = a
CAN: simple example node I ::insert (K,V) I (1) a = h x (K) b = h y (K) y = b x = a
CAN: simple example node I ::insert (K,V) I (1) a = h x (K) b = h y (K) (2) rout e(K,V) -> (a,b)
CAN: simple example node I ::insert (K,V) I (1) a = h x (K) b = h y (K) (K,V) (2) rout e(K,V) -> (a,b) (3) (a,b) st ores (K,V)
CAN: simple example node J ::ret rieve(K) (1) a = h x (K) b = h y (K) (K,V) (2) rout e “ ret rieve(K)” t o (a,b) J
CAN Dat a st or ed in t he CAN is addr essed by name (i.e. key), not locat ion (i.e. I P address)
CAN: rout ing t able
CAN: rout ing (a,b) (x,y)
CAN: rout ing A node only maint ains st at e f or it s immediat e neighbor ing nodes
CAN: node insert ion Boot st r ap node new node 1) Discover some node “ I ” already in CAN
CAN: node insert ion I new node 1) discover some node “ I ” already in CAN
CAN: node insert ion (p,q) 2) pick r andom point in space I new node
CAN: node insert ion (p,q) J I new node 3) I rout es t o (p,q), discovers node J
CAN: node insert ion new J 4) split J ’s zone in half … new owns one half
CAN: node insert ion I nsert ing a new node af f ect s only a single ot her node and it s immediat e neighbors
CAN: node f ailures • Need t o repair t he space – recover dat abase (weak point ) • sof t -st at e updat es • use r eplicat ion, r ebuild dat abase f r om r eplicas – repair rout ing • t akeover algor it hm
CAN: t akeover algorit hm • Simple f ailures – know your neighbor’s neighbors – when a node f ails, one of it s neighbors t akes over it s zone • More complex f ailure modes – simult aneous f ailure of mult iple adj acent nodes – scoped f looding t o discover neighbors – hopef ully, a rare event
CAN: node f ailures Only t he f ailed node’s immediat e neighbors are required f or recovery
Design recap • Basic CAN – complet ely dist ribut ed – self -organizing – nodes only maint ain st at e f or t heir immediat e neighbors • Addit ional design f eat ures – mult iple, independent spaces (realit ies) – background load balancing algorit hm – simple heurist ics t o improve perf ormance
Out line • I nt roduct ion • Design • Evaluat ion • St r engt hs & Weaknesses • Ongoing Work
Evaluat ion • Scalabilit y • Low-lat ency • Load balancing • Robust ness
CAN: scalabilit y • For a unif ormly part it ioned space wit h n nodes and d dimensions – per node, number of neighbors is 2d – aver age r out ing pat h is (dn 1/ d )/ 4 hops – simulat ions show t hat t he above result s hold in pract ice • Can scale t he net work wit hout increasing per-node st at e • Chord/ Plaxt on/ Tapest ry/ Buzz – log(n) nbrs wit h log(n) hops
CAN: low-lat ency • Problem – lat ency st ret ch = (CAN rout ing delay) (I P rout ing delay) – applicat ion-level rout ing may lead t o high st ret ch • Solut ion – increase dimensions, realit ies (reduce t he pat h lengt h) – Heurist ics (reduce t he per-CAN-hop lat ency) • RTT-weight ed r out ing • mult iple nodes per zone (peer nodes) • det er minist ically r eplicat e ent r ies
CAN: low-lat ency # dimensions = 2 180 160 w/ o heurist ics Lat ency st ret ch 140 w/ heurist ics 120 100 80 60 40 20 0 16K 32K 65K 131K # nodes
CAN: low-lat ency # dimensions = 10 10 8 w/ o heurist ics Lat ency st ret ch w/ heurist ics 6 4 2 0 16K 32K 65K 131K # nodes
CAN: load balancing • Two pieces – Dealing wit h hot -spot s • popular (key,value) pair s • nodes cache r ecent ly r equest ed ent r ies • over loaded node r eplicat es popular ent r ies at neighbor s – Unif orm coordinat e space part it ioning • unif or mly spr ead (key,value) ent r ies • unif or mly spr ead out r out ing load
Unif orm Part it ioning • Added check – at j oin t ime, pick a zone – check neighboring zones – pick t he largest zone and split t hat one
Unif orm Part it ioning 65,000 nodes, 3 dimensions 100 w/ o check 80 Per cent age w/ check of nodes 60 V = t ot al volume n 40 20 0 V 2V 4V 8V V V V V 16 8 4 2 Volume
CAN: Robust ness • Complet ely dist ribut ed – no single point of f ailur e ( not applicable t o pieces of dat abase when node f ailur e happens) • Not exploring dat abase recovery (in case t here are mult iple copies of dat abase) • Resilience of rout ing – can rout e around t rouble
Out line • I nt roduct ion • Design • Evaluat ion • St r engt hs & Weaknesses • Ongoing Work
St rengt hs • More resilient t han f looding br oadcast net wor ks • Ef f icient at locat ing inf ormat ion • Fault t olerant rout ing • Node & Dat a High Availabilit y (w/ improvement ) • Manageable r out ing t able size & net work t raf f ic
Weaknesses • I mpossible t o perf orm a f uzzy search • Suscept ible t o malicious act ivit y • Maint ain coherence of all t he indexed dat a (Net work overhead, Ef f icient dist r ibut ion) • St ill relat ively higher rout ing lat ency • Poor per f or mance w/ o impr ovement
Suggest ions • Cat alog and Met a indexes t o perf orm search f unct ion • Ext ension t o handle mut able cont ent ef f icient ly f or web-host ing • Securit y mechanism t o def ense against at t acks
Out line • I nt roduct ion • Design • Evaluat ion • St r engt hs & Weaknesses • Ongoing Work
Ongoing Work • Topologically-sensit ive CAN const ruct ion – dist ribut ed binning
Dist ribut ed Binning • Goal – bin nodes such t hat co-locat ed nodes land in same bin • I dea – well known set of landmar k machines – each CAN node, measur es it s RTT t o each landmar k – or der s t he landmar ks in or der of incr easing RTT • CAN const ruct ion – place nodes f r om t he same bin close t oget her on t he CAN
Recommend
More recommend