Store, Forget & Check: Using Algebraic Signatures to Check Remotely Administered Storage Ethan L. Miller & Thomas J. E. Schwarz Storage Systems Research Center University of California, Santa Cruz
What’s the problem? • Systems store data on remote nodes • Remote nodes may not be trustworthy • Data owner must check to ensure that data is really stored • Two current approaches: • Read data from multiple sites and check for consistency • Generate checksum remotely and compare to checksum of local data • We developed an efficient algorithm that does not require keeping a local copy of the data 2
Internet storage: backup • Participants in the scheme offer limited storage on their machine in exchange for storing their own data • Data protected using parity or redundancy • Extra blocks calculated using m/n redundancy codes • Generate n blocks • Require any m of the blocks to rebuild the data • Many known mechanisms for m/n codes • Linear interpolation • XOR and Galois field-based • Participants need to be able to verify that other nodes are doing their part... 3
Storage Service Providers • Storage utility provides remotely managed storage • Client sends data to the SSP • Client retrieves data as needed • Trust issue: how can client tell if SSP is doing its job? • Read data, check (public key-based) signature • Read data, decrypt, check secure hash and object ID • SafeStore does something like this • Other approaches that don’t use network bandwidth? 4
Peer-to-peer file systems • Farsite: uses free space on workstations within an organization • Freehaven: anonymity of storer • OceanStore • “Billions of users” • Byzantine fault tolerance, k-availability through erasure- codes • PAST • Users can store files up to their quota • Provides k-availability through replication • CFS, Intermemory, Ivy, Starfish, … 5
Common challenges • Storage nodes cannot be trusted • Storage nodes might lack high uplink bandwidth • Storage nodes might have low availability • Free Rider problem • Node pretends to store data • In reality, uses replicas (or protection against unavailability mechanism) to fetch requested file from elsewhere • Gains the benefits of participation without providing storage 6
Terribly naïve algorithm • Maintain local copy of data • Periodically request blocks of data and compare to the local copy • Problems • Very bandwidth-intensive • Can’t check much data • Need to keep the original! 7
Terribly naïve algorithm • Maintain local copy of data • Periodically request blocks of data and compare to the local copy • Problems • Very bandwidth-intensive • Can’t check much data • Need to keep the original! 7
Terribly naïve algorithm • Maintain local copy of data • Periodically request blocks of data and compare to the local copy • Problems • Very bandwidth-intensive • Can’t check much data • Need to keep the original! ≟ 7
Verification: existing algorithm • Periodically, verify random blocks • Compute function across the blocks ( m/n coding) • Alternative: verify keyed hash stored with the block • Problems: • Need to transfer entire block • Taxes network with diagnostic data • Peers often have asymmetric Internet connections • Leaks information heavily 8
Verification: existing algorithm • Periodically, verify random blocks • Compute function across the blocks ( m/n coding) • Alternative: verify keyed hash stored with the block • Problems: • Need to transfer entire block • Taxes network with diagnostic data • Peers often have asymmetric Internet connections • Leaks information heavily 8
Verification: existing algorithm • Periodically, verify random blocks • Compute function across the blocks ( m/n coding) • Alternative: verify keyed hash stored with the block • Problems: • Need to transfer entire block • Taxes network with diagnostic data • Peers often have asymmetric Internet connections • Leaks information heavily ⊕ ⊕ 8
Verification: existing algorithm • Periodically, verify random blocks • Compute function across the blocks ( m/n coding) • Alternative: verify keyed hash stored with the block • Problems: • Need to transfer entire block • Taxes network with diagnostic data • Peers often have asymmetric Internet connections • Leaks information heavily ≟ ⊕ ⊕ 8
Verification using algebraic signatures • Solution: use checksums? • Cryptographic checksums (like SHA-1) won’t work for randomly selected ranges • Requires original data for comparison • Our scheme • Uses small challenges and responses • Allows unpredictable tests • Free rider can’t just store the answer to all possible challenges (with any storage benefit) • Verifies that all remote chunks are consistent with each other • Requires that parity is calculated with an XOR code, a linear m/n code, or a convolutional code • Examples: X-code, EvenOdd, row-diagonal parity, linear codes over a Galois field 9
What is a Galois field? • Simple answer: • Calculations on a set of symbols • A field called GF(2 n ) uses n -bit symbols • Two kinds of operations • Addition (done by XOR) • Multiplication (more complex, done by tables) • Complex answer: • Galois fields are math done using the coefficients of polynomial equations • Often, coefficients are represented in base-2 • Galois field using polynomials with maximum degree n and base-2 coefficients are called GF(2 n ) • This answer explains how the addition and multiplication tables are generated 10
What is an algebraic signature? • Digital hash with algebraic properties • Important properties: • Small changes in data result in complete change of signature • Signature of parity is parity of signatures D 1 D 2 D 3 D m P 1 P 2 P 3 P k ••• ••• (sig(D 1 ),sig(D 2 ),sig(D 3 ), … sig(D m ), sig(P 1 ),sig(P 2 ),sig(P 3 )…sig(P k )) is a codeword! 11
Algebraic signatures • Defined over same Galois field as the linear m/n code • Use “primitive” element a • All non-zero elements are powers of a • Consists of n coordinates • Additional properties if a i = a i • Coordinate signature defined by 12
Algebraic signatures • Algebraic properties • Assume that X and Y are large data objects: • sig(X ⊕ Y) = sig(X) ⊕ sig(Y) • sig( β ⋅ X) = β ⋅ sig(X) • Multiplication is in the Galois field of the signature calculation • Signatures and parity formation commute • Signatures can be updated from the old signature and the signature of the delta (XOR) between old and new data • Signature calculation is fast ! • Hundreds of megabytes per second on a modern CPU • Speed limited by disk bandwidth 13
Our algorithm D 1 D 2 D 3 P • Store data across distributed system • Challenge sites to prove that they hold the data • Sites respond with the signatures of requested data • Sites reveal tiny amount of information: size of signature • Challenger verifies that the signatures are consistent 14
Our algorithm D 1 D 2 D 3 P • Store data across distributed system • Challenge sites to prove that they hold the data • Sites respond with the signatures of requested data • Sites reveal tiny amount of information: size of signature • Challenger verifies that the Calculate signature of 32 byte signatures are consistent ranges at 4+ i × 71, i = 5,…,20 14
Our algorithm D 1 D 2 D 3 P • Store data across distributed system • Challenge sites to prove that they hold the data • Sites respond with the signatures of requested data • Sites reveal tiny amount of information: size of signature • Challenger verifies that the Calculate signature of 32 byte signatures are consistent ranges at 4+ i × 71, i = 5,…,20 14
Our algorithm D 1 D 2 D 3 P • Store data across distributed system • Challenge sites to prove that they hold the data • Sites respond with the signatures of requested sig 1 data • Sites reveal tiny amount of information: size of signature • Challenger verifies that the Calculate signature of 32 byte signatures are consistent ranges at 4+ i × 71, i = 5,…,20 14
Our algorithm D 1 D 2 D 3 P • Store data across distributed system • Challenge sites to prove that they hold the data • Sites respond with the signatures of requested sig 1 sig 2 sig 3 sig p data • Sites reveal tiny amount of information: size of signature • Challenger verifies that the Calculate signature of 32 byte signatures are consistent ranges at 4+ i × 71, i = 5,…,20 14
Our algorithm D 1 D 2 D 3 P • Store data across distributed system • Challenge sites to prove that they hold the data • Sites respond with the signatures of requested sig 1 sig 2 sig 3 sig p data • Sites reveal tiny amount of information: size of signature • Challenger verifies that the signatures are consistent 14
Our algorithm D 1 D 2 D 3 P • Store data across distributed system • Challenge sites to prove that they hold the data • Sites respond with the signatures of requested sig 1 sig 2 sig 3 sig p data • Sites reveal tiny amount of information: size of signature • Challenger verifies that the signatures are consistent sig 1 ⊕ sig 2 ⊕ sig 3 ≟ sig P 14
Recommend
More recommend