A Digital Fountain Approach to Reliable Distribution of Bulk Data John Byers, ICSI Michael Luby, ICSI Michael Mitzenmacher, Compaq SRC Ashu Rege, ICSI
Application: Software Distribution • New release of widely used software. • Hundreds of thousands of clients or more. • Bulk data: tens or hundreds of MB • Heterogeneous clients: – Modem users: hours – Well-connected users: minutes
Primary Objectives • Scale to vast numbers of clients – No ARQs or NACKs – Minimize use of network bandwidth • Minimize overhead at receivers: – Computation time – Useless packets • Compatibility – Networks: Internet, satellite, wireless – Scheduling policies, i.e. congestion control
Impediments • Packet loss – wired networks: congestion – satellite networks, mobile receivers • Receiver heterogeneity – packet loss rates – end-to-end throughput • Receiver access patterns – asynchronous arrivals and departures – overlapping access intervals
Digital Fountain k Source Instantaneous Encoding Stream Transmission k Received Instantaneous Can recover file k Message from any set of k encoding packets.
Digital Fountain Solution Transmission 0 hours File 1 hour 2 hours 3 hours 4 hours User 1 User 2 5 hours
Is FEC Inherently Bad? • Faulty Reasoning – FEC adds redundancy – Redundancy increases congestion and losses – More losses necessitate more transmissions – FEC consumes more overall bandwidth • But… – Each and every packet can be useful to all clients – Each client consumes minimum bandwidth possible – FEC consumes less overall bandwidth by compressing bandwidth across clients
DF Solution Features • Users can initiate the download at their discretion. • Users can continue download seamlessly after temporary interruption. • Tolerates moderate packet loss. • Low server load - simple protocol. • Does scale well. • Low network load.
Approximating a Digital Fountain k Source Encoding Time Encoding Stream Received (1 + c) k Decoding Time k Message
Approximating a DF: Performance Measures • Time Overhead: – Time to decode (or encode) as a function of k . • Decoding Inefficiency: packets needed to decode k
Work on Erasure Codes • Standard Reed-Solomon Codes – Dense systems of linear equations. – Poor time overhead (quadratic in k ) – Optimal decoding inefficiency of 1 • Tornado Codes [LMSSS ‘97] – Sparse systems of equations. – Fast encoding and decoding (linear in k ) – Suboptimal decoding inefficiency
Tornado Z: Encoding Structure Irregular Irregular bipartite bipartite graph graph k stretch factor = 2 k = 16,000 nodes = source data = redundancy
Encoding/Decoding Process ⊕ ⊕ a b f a ⊕ ⊕ ⊕ ⊕ a b c d g b ⊕ ⊕ ⊕ c e g h c ⊕ ⊕ ⊕ ⊕ ⊕ b d e f g h d e f g h
Timing Comparison Encoding time, 1K packets Decoding time, 1K packets Size Reed-Solomon Tornado Z Size Reed-Solomon Tornado Z 4.6 sec. 0.11 sec. 2.06 sec. 0.18 sec. 250 K 250 K 500 K 19 sec. 0.18 sec. 500 K 8.4 sec. 0.24 sec. 93 sec. 40.5 sec. 1 MB 0.29 sec. 1 MB 0.31 sec. 2 MB 442 sec. 0.57 sec. 2 MB 199 sec. 0.44 sec. 4 MB 30 min. 1.01 sec. 4 MB 13 min. 0.74 sec. 8 MB 2 hrs. 1.99 sec. 8 MB 1 hr. 1.28 sec. 16 MB 8 hrs. 3.93 sec. 16 MB 4 hrs. 2.27 sec. Tornado Z: Average inefficiency = 1.055 Both codes: Stretch factor = 2
Cyclic Interleaving Transmission Encoded Interleaved Blocks Encoding Blocks Encoding Copy 1 File Encoding Copy 2 Tornado Encoding
Cyclic Interleaving: Drawbacks • The Coupon Collector’s Problem – Waiting for packets from the last blocks: T B blocks – More blocks: faster decoding, larger inefficiency
Scalability over File Size Decoding Inefficiency, 500 Receivers, p = 0.1 1.6 Interleaved, T = 20, Max. Interleaved, T = 20, Avg. Interleaved, T = 50, Max. 1.5 Interleaved, T = 50, Avg. Tornado Z, Max. Decoding Inefficiency Tornado Z, Avg. 1.4 1.3 1.2 1.1 1 100 1000 10000 File Size, KB
Scalability over Receivers Decoding Inefficiency on a 1MB File, p = 0.1 1.8 Interleaved, T = 20 Interleaved, T = 50 Decoding Inefficiency 1.6 Tornado Z 1.4 1.2 1 1 10 100 1000 10000 Receivers
Digital Fountain Prototype • Built on top of IP Multicast. • Tolerating heterogeneity: – Layered multicast – Congestion control [VRC ‘98] • Experimental results over MBONE.
Research Directions • Other applications for digital fountains – Dispersity routing – Accessing data from multiple mirror sites in parallel • Improving the codes • Implementation and deployment – Scale to large number of clients – Network interactions
Recommend
More recommend