S Security in Outsourced i i O d Databases Databases (Query Answer Assurance) (Q y ) 1
Traditional Client-Server Arch Traditional Client Server Arch. DB Client Query Results Results Owner • Client queries are satisfied by a trusted server • Secure the server • Secure the communication channel, e.g. use SSL 2
Data Publishing (Database-as-a-Service) DB Client Owner Database Third Party S Server 3
Data Publishing Data Publishing DB Client Owner Query Q y Results Third Party S Server 4
Data Publishing Data Publishing • Pushes business logic and data processing from corporate data centers to third party servers at the “edge” of the network party servers at the edge of the network – Distribution of (part of) the database to edge servers – Edge servers perform query processing • Why? – Most organizations need DBMSs Most organizations need DBMSs – DBMSs extremely complex to deploy, setup, maintain – Require skilled DBAs (at very high cost!) • Advantages – C t d Cuts down network latency and produces faster responses t k l t d d f t – Cheaper way to achieve scalability – Lowers dependency on corporate data center (removes single point of failure) – Reduced cost to client • • Get what you need, pay for what you use and not for: hardware, software infrastructure or Get what you need pay for what you use and not for: hardware software infrastructure or personnel to deploy, maintain, upgrade… – Reduced overall cost • cost amortization across users – Better service • leveraging experts 5
The Challenge The Challenge DB Client Owner Query Q y The Truth? The Truth? Results The Whole Truth? The Whole Truth? Nothing But The Truth? Nothing But The Truth? Nothing But The Truth? Nothing But The Truth? Third Party S Server Untrusted! 6
The Challenge The Challenge Sel * FROM Emp Owner DB Client WHERE Sal < 5000 WHERE Sal < 5000 ID Name Sal Dept 5 A 2000 1 2 C 3500 2 1 1 D D 8010 1 8010 1 4 B 2200 3 3 E 7000 2 Server 7
The Challenge The Challenge Sel * FROM Emp Owner DB Client WHERE Sal < 5000 WHERE Sal < 5000 5 A 2000 1 Result = 2 C 3500 2 ID Name Sal Dept 4 B 2200 3 5 A 2000 1 2 C 3500 2 1 1 D D 8010 1 8010 1 4 B 2200 3 3 E 7000 2 Server 8
Security Concerns Security Concerns 5 A 2000 1 DB Client Result = Result = 2 C 3500 2 2 C 3500 2 4 B 2200 3 Query Result’ ID Name Sal Dept 5 A 2000 1 2 C 3500 2 1 D 8010 1 4 B 2200 3 Server Server 3 E 7000 2 9
Security Concerns Security Concerns 5 A 2000 1 DB Client Result = Result = 2 C 3500 2 2 C 3500 2 4 B 2200 3 Query Result’ 5 A 2000 1 ID Name Sal Dept 2 C 3500 2 5 A 2000 1 4 B 2200 3 2 C 3500 2 1 D 8010 1 Server is trustworthy! 4 B 2200 3 Server Server 3 E 7000 2 10
Security Concerns Security Concerns 5 A 2000 1 DB Client Result = Result = 2 C 3500 2 2 C 3500 2 4 B 2200 3 Query Result’ 5 A 3500 1 ID Name Sal Dept 2 D 3500 2 5 A 2000 1 4 B 2200 1 2 C 3500 2 1 D 8010 1 Server is malicious! 4 B 2200 3 Server Server Records are tampered 3 E 7000 2 11
Security Concerns Security Concerns 5 A 2000 1 DB Client Result = Result = 2 C 3500 2 2 C 3500 2 4 B 2200 3 Query Result’ 5 A 2000 1 ID Name Sal Dept 2 C 3500 2 5 A 2000 1 2 C 3500 2 1 D 8010 1 Server is malicious! 4 B 2200 3 Server Server Answers are dropped 3 E 7000 2 (Incompleteness) 12
Security Concerns Security Concerns 5 A 2000 1 DB Client Result = Result = 2 C 3500 2 2 C 3500 2 4 B 2200 3 Query Result’ 5 A 2000 1 ID Name Sal Dept 2 C 3500 2 5 A 2000 1 4 B 2200 3 2 C 3500 2 1 D 1500 2 1 D 8010 1 6 E 3400 1 4 B 2200 3 Server Server 3 E 7000 2 Server is malicious! Spurious answers are added 13
Data Security Challenge: Data Security Challenge: Design Objectives: i bj i • Authenticity : Every entry originated from the owner • Completeness : No result entry is omitted from the answer p y • Precision : Minimum information leakage • Security : Computationally infeasible to cheat • Efficiency : Polynomial proof • Efficiency : Polynomial proof 14
Collision-resistant (one-way) h hash functions h f i • Given x, easy to compute h(x); given h(x), difficult to determine x • i.e., it is computationally hard to find x 1 and x 2 s.t. h(x 1 )=h(x 2 ) • Computational hard? Based on well established assumptions such as discrete logarithms • E.g., SHA, MD5 15
Public key digital signature schemes Cryptographic tool for authenticating the signed message as well as its origin, e.g., RSA, DSA Sender m m Insecure Channel Recipient KeyGen (SK, PK) SK Ver(m PK ) valid? Ver(m, PK, ) valid? m m By checking: Sign(h(m), SK) h(m) = ? Sign -1 (PK, ) , ) ( ) S g ( 16
Authentic Publication Scheme Authentic Publication Scheme Trusted Trusted DB Client … Result + Correction proof Correction proof Q Query Does not certify data Unsecured ( ) (a) Untrusted Edge Server Edge Server (b) Disclaim liability DB +Certification (Verification Objects) (Verification Objects) P bli k Public key Certify data Trusted (a) Ownership Central DBMS (b) Liability ( ) y 17
Naï e Scheme Naïve Scheme Each attribute has a signed digest g g Each tuple has a signed digest Relation R Relation R D T (A 1 , D 1 ) (A i , D i ) … … D T – Signed tuple digest D Ai – attribute digest 18
Naïve Scheme Naï e Scheme Query: SELECT A 3 , A 4 , … FROM R Q y 3 , 4 , Filtered attributes Result tuples Result tuples D T A 3 A 4 … D 1 D 2 D 5 … D T – Signed tuple digest D i – attribute digest of A i 19
Naïve Scheme (Example) Naïve Scheme (Example) A1 B1 C1 a1 b1 c1 T1 A2 B2 C2 a2 b2 c2 T2 A3 B3 C3 a3 b3 c3 T3 T = sign(g(h(A)|h(B)|h(C)) g and h are collision-resistant hash functions ai h(Ai) ai = h(Ai) Retrieve whole of first tuple: Server returns A1, B1, C1, T1; Client can compute h(A1), h(B1) and h(C1) and verify T1 from A1 B1 and C1 h(C1), and verify T1 from A1, B1 and C1 Retrieve only attributes A1 and B1 of first tuple: Server returns A1, B1, c1 and T1; Client has no access to C1, so c1 has to be provided 1 h b id d 20 Issues??
Using Merke Hash Tree (MHT) Using Merke Hash Tree (MHT) • For each tuple t, a tuple hash h(t) is computed h(t) = h(h(t.A1) | h(t.A2) | … | h(t.An)) • Assume a total order on attribute A of a relation R with |R| tuples (e.g., based on the primary key) with |R| tuples (e.g., based on the primary key) – MHT(R,A) is a binary tree with |R| leaf nodes and hash values h(i) associated with node i – If i is a leaf node, then h(i) = h(ti), ti is the ith tuple in the order – If i is an internal node, then h(i) = h(h(l), h(r)) where l and r are the left and right children of node i. – The root hash is the digest of all values in the Merkle-hash tree MHT(R A) MHT(R,A). 21
Merkle Hash Tree Sign(h 1234 ,SK) N 1234 = h(N 12 | N 34 ) N 12 = h(N 1 | N 2 ) N 34 = h(N 3 | N 4 ) N 1 = h(d 1 ) N 2 = h(d 2 ) N 3 = h(d 3 ) N 4 = h(d 4 ) k 1 , d 1 k 4 , d 4 k 2 , d 2 k 3 , d 3 Ordering attribute: k 1 < k 2 < k 3 < k 4 ; d i are tuples Owner needs to sign root node (N 1234 ) 22
MHT: Point Search Sign(h 1234 ,SK) N 1234 = h(N 12 | N 34 ) N 12 = h(N 1 | N 2 ) N 34 = h(N 3 | N 4 ) N 1 = h(d 1 ) ( 1 ) N 2 = h(d 2 ) ( 2 ) N 3 = h(d 3 ) ( 3 ) N 4 = h(d 4 ) ( 4 ) 1 2 3 4 Query: Retrieve tuple d 2 Query: Retrieve tuple d 2 23
MHT: Point Search Sign(h 1234 ,SK) N 1234 = h(N 12 | N 34 ) N 12 = h(N 1 | N 2 ) N 34 = h(N 3 | N 4 ) N 1 = h(d 1 ) N 2 = h(d 2 ) N 3 = h(d 3 ) N 4 = h(d 4 ) Edge server returns d 2 , N 1 , N 34 and signed N 1234 Client computes N 1234 = h(h(h(d 2 )|N 1 ), N 34 ) and verify that the signed value is correct 24
MHT: Point Search Sign(h 1234 ,SK) N 1234 = h(N 12 | N 34 ) N 12 = h(N 1 | N 2 ) N 34 = h(N 3 | N 4 ) N 1 = h(d 1 ) N 2 = h(d 2 ) N 3 = h(d 3 ) N 4 = h(d 4 ) Edge server returns d 2 , N 1 , N 34 and signed N 1234 (and the structure) Client computes N 1234 = h(h(h(d 2 )|N 1 ), N 34 ) and verify that the signed value is correct 25
Range Queries Path l LCA(q) q GLB(q) LUB(q) 26
Example: Range queries p g q Query answer What are returned? 27
Example: Range queries p g q Query answer digest What are returned? 28
Example: Range queries Example: Range queries digest What are returned? 29
Recommend
More recommend