Proofs of Retrievability via Fountain Code Sumanta Sarkar and Reihaneh Safavi-Naini Department of Computer Science, University of Calgary, Canada Foundations and Practice of Security October 25, 2012
Outsourcing Data into Cloud Storage ◮ Suppose a user generates lots of electronic data: videos, photos, emails, text documents. ◮ He also has many devices: desktop, laptop, tablet, smartphone. But none of them are capable of storing huge data. ◮ Cloud storage comes with the solution: ◮ Outsource the data into the cloud. ◮ Access all data from all the devices and from anywhere. ◮ Cloud keeps the whole data intact as long as the client wants.
Risk of Outsourcing Data into Cloud Storage ◮ Completely rely on the cloud for the integrity of the data. ◮ No control over the infrastructure of the cloud. ◮ Device failure may erase some portions of the data. ◮ A dishonest cloud may erase some portions of the data to reduce its own storage cost.
Checking the Integrity of the Data ◮ Store a MAC of the data locally. ◮ Can download the whole file, compute the MAC and check with the previously stored one. ◮ Not a practical solution when the data is big.
Proofs of Retrievability (PoR) ◮ Juels and Kaliski 2007 introduced Proofs of Retrievability (PoR) protocol which verifies the integrity of the data through an audit protocol.
Proofs of Retrievability (PoR) ◮ Juels and Kaliski 2007 introduced Proofs of Retrievability (PoR) protocol which verifies the integrity of the data through an audit protocol. ◮ The client applies an erasure code on the file M and stores the encoded file M ′ in the cloud. ◮ M can be decoded from a fraction, say ρ of M ′ .
Proofs of Retrievability (PoR) ◮ Juels and Kaliski 2007 introduced Proofs of Retrievability (PoR) protocol which verifies the integrity of the data through an audit protocol. ◮ The client applies an erasure code on the file M and stores the encoded file M ′ in the cloud. ◮ M can be decoded from a fraction, say ρ of M ′ . ◮ Along with M ′ , the client also stores some extra information ∆( M ) which will be used in the audit. ◮ An audit is a challenge-response protocol. In the audit the client (verifier) challenges on some random location of the file and cloud’s (prover) correct response proves that file blocks are intact in those locations.
Proofs of Retrievability (PoR) ◮ Juels and Kaliski 2007 introduced Proofs of Retrievability (PoR) protocol which verifies the integrity of the data through an audit protocol. ◮ The client applies an erasure code on the file M and stores the encoded file M ′ in the cloud. ◮ M can be decoded from a fraction, say ρ of M ′ . ◮ Along with M ′ , the client also stores some extra information ∆( M ) which will be used in the audit. ◮ An audit is a challenge-response protocol. In the audit the client (verifier) challenges on some random location of the file and cloud’s (prover) correct response proves that file blocks are intact in those locations. ◮ The security of a PoR scheme is formalized by showing the existence of an extractor which retrieves the file with very high probability from an erasing adversary that can pass the audit protocol with some reasonable probability.
Efficiency of PoR System ◮ The computational cost of preparing a file for storing in the cloud, and calculating the response, ◮ Communication cost required during an audit and, ◮ The extra storage (overhead) needed for storing the file M .
Efficiency of PoR System ◮ The computational cost of preparing a file for storing in the cloud, and calculating the response, ◮ Communication cost required during an audit and, ◮ The extra storage (overhead) needed for storing the file M . ◮ So small size challenge improves the communication cost of the protocol, and also the computation cost of the prover as less blocks will be involved in the computation of response.
Bounded/Unbounded-use PoR and Private/Public Verifiability ◮ PoR that allows “unlimited” number of challenge-response interactions is unbounded-use , otherwise it is bounded-use . ◮ A private verifiable PoR allows only the owner of the file who stores the file can run the challenge-response protocol, whereas in public verifiable PoR, anyone knowing the appropriate public key can perform the verification.
Main Contribution ◮ We present an unbounded-use private PoR scheme that improves the cost of response computation and the cost of communication of challenges in the average case. ◮ Our construction closely follows that of Shacham and Waters 2008 and uses Fountain code.
Related Work on PoR ◮ PoR was introduced by Juels and Kaliski 2007 and subsequently has been extended and improved by Shacham and Waters 2008; Bowers, Juels and Oprea 2009; Dodis, Vadhan and Wichs 2009. ◮ JK07 scheme has quadratic communication complexity (in terms of security parameter) for response. ◮ This was improved to linear complexity in SW08 by using homomorphic linear authenticators. ◮ Dodis et al. viewed the set of all correct responses corresponding to the file M ′ = Enc ( M ) stored in the cloud as a codeword C which is a challenge-response encoding of M . ◮ The set of all responses for the same file M ′ from the prover form a word C ′ which may differ from C . The extractor decodes M from C ′ .
Background on PoR We follow SW08. ◮ Kg() : This randomized algorithm generates a secret key sk and the public key pk . ◮ St(sk,M) : This randomized algorithm takes the secret key sk and the client file M ∈ { 0 , 1 } ∗ . Then it processes M and outputs M ∗ which is stored in the cloud. ◮ P , V : The randomized algorithms that correspond to the prover and the verifier. At the end of the prover-verifier interaction: { 0 , 1 } R ← ( V ( pk , sk , t ) ⇋ P ( pk , t , M ∗ )) .
PoR properties: Correctness and Soundness ◮ Correctness means that if the prover is honest then ( V ( pk , sk , t ) ⇋ P ( pk , t , M ∗ )) = 1 . ◮ A PoR is sound if any prover that convinces the verification means that it actually holds the file.
ǫ -adversary and the Extractor ◮ Adversary is assumed to erase some portion of the file with probability bounded by a fixed value. ◮ A prover is ǫ -admissible if it convincingly answers an ǫ fraction of challenges. ◮ A PoR scheme is ǫ -sound if there exists an extraction algorithm (Extractor) which by interacting (challenge-response) with the ǫ -admissible adversary can recover the file except with negligible probability.
Fountain Codes ◮ In Fountain codes the sender generates potentially a limitless string of encoded symbols. The receiver can recover the message from sufficiently many encoded symbols. ◮ Examples: LT code [Luby 2002] and Raptor code [Shokrollahi 2006] are two well known Fountain codes.
Raptor Code: Encoding Precoding ◮ The message is ( x 1 , . . . , x k ), where each x i is of ℓ -bits. ◮ First ( x 1 , . . . , x k ) is encoded to ( y 1 , . . . , y n ) by an erasure code C n which can recover ( x 1 , . . . , x k ) from any ρ n number of symbols. LT coding To generate Raptor encoding symbols, LT code is applied on ( y 1 , . . . , y n ). For that, a degree distribution defined by a polynomial n � w i x i w ( x ) = i =1 where w i is the probability of choosing i , i ∈ { 1 , . . . , n } is chosen. ◮ Randomly choose a degree, say j , using w ( x ). ◮ Choose uniformly at random, j symbols from the set { y 1 , . . . , y n } , and XOR them to produce the encoded symbol (output symbol) r i = y i 1 ⊕ . . . ⊕ y i j .
Raptor Code structure
Raptor Code: Decoding ◮ After collecting r i symbols little more than k in amount, apply BP decoding and get ρ fraction of { y 1 , . . . , y n } , and then applying decoding of C n receiver can recover ( x 1 , . . . , x k ).
Raptor Code parameters ◮ The following are from the Raptor code construction given in [Shokrollahi 2006]. ◮ Let α > 0 be a real number, set D = ⌈ 4(1 + α ) /α ⌉ and define D ( i − 1) i + x D +1 x i 1 � w D ( x ) = µ + 1( µ x + ) , (1) D i =2 where µ = ( α/ 2) + ( α/ 2) 2 . ◮ The average of w D is ln(1 /α ) + β + O ( α ) , (2) where 1 < β < 1 + γ + ln(9), the constant γ is the Euler’s constant.
Results on decoding Raptor Code Lemma (Shokrollahi 2006) There exists a positive real number c (depending on α ) such that with an error probability of at most e − cn any set of (1 + α/ 2) n + 1 output symbols of the LT-code with distribution w D and n-input symbols y 1 , . . . , y n are sufficient to recover at least ρ n input symbols from { y 1 , . . . , y n } via belief propagation decoding, where ρ = 1 − α/ 4 1+ α . Theorem (Shokrollahi 2006) Let α > 0 be a real number, k an integer, D = ⌈ 4(1 + α ) /α ⌉ , R = (1 + α/ 2) / (1 + α ) , n = ⌈ k / R ⌉ . Let C n be an erasure code which can decode (1 − R ) / 2 erasures. Then the Raptor code with precode C n and the LT-code with the distribution w D ( x ) which encodes k symbols, can decode from (1 + α ) k output symbols.
PoR of SW08 ◮ Suppose F ′ = ( m 1 , . . . , m n ) is the erasure encoded file of the client file F . Each m i ∈ Z p . ◮ Choose θ ∈ Z p randomly and create authenticators σ i = PRF ( i ) + θ m i . ◮ Challenge: Q = { ( i 1 , v 1 ) , . . . , ( i w , v w ) } , where i j randomly chosen from { 1 , . . . , n } and v j chosen randomly from Z p . ◮ Response: r = � ( i , v i ) ∈ Q v i m i and σ = � ( i , v i ) ∈ Q v i σ i . ? ◮ Verify: σ = � ( i , v i ) ∈ Q v i PRF ( i ) + θ r .
Recommend
More recommend