integrity verification of outsourced frequent itemset
play

Integrity Verification of Outsourced Frequent Itemset Mining with - PowerPoint PPT Presentation

Integrity Verification of Outsourced Frequent Itemset Mining with Deterministic Guarantee Boxiang Dong Ruilin Liu Wendy Hui Wang Department of Computer Science Stevens Institute of Technology Hoboken, NJ December 10, 2013


  1. Integrity Verification of Outsourced Frequent Itemset Mining with Deterministic Guarantee Boxiang Dong Ruilin Liu Wendy Hui Wang Department of Computer Science Stevens Institute of Technology Hoboken, NJ December 10, 2013

  2. Data-mining-as-a-service (DMaS) Data Mining as a Service : • Weak client • Computationally powerful service provider (e.g. cloud) • Result integrity: are the returned mining results the same as if the computation were locally executed? Integrity Verification of Outsourced Frequent Itemset Mining with Deterministic Guarantee. ICDM 13. Dong, Liu, Wang 2 / 25

  3. Outsourcing Setting • We focus on the problem of result integrity of outsourced frequent itemset mining . • The architecture of outsourcing frequent itemset mining Integrity Verification of Outsourced Frequent Itemset Mining with Deterministic Guarantee. ICDM 13. Dong, Liu, Wang 3 / 25

  4. Verification Goal Given a transaction dataset D and its correct frequent itemset mining result F , let F S be the errorneous mining result that the server returns. • Integrity concerns: Completeness no frequent itemset is missing in F S . Correctness all itemsets in F S are frequent. • We propose an efficient approach to catch incorrect/incomplete mining result with 100% certainty . Integrity Verification of Outsourced Frequent Itemset Mining with Deterministic Guarantee. ICDM 13. Dong, Liu, Wang 4 / 25

  5. Verification Framework • The server constructs cryptographic proofs of the mining results. • We use the set intersection verification protocol[PTT11] to construct the proofs. • Use the proof to verify the true support of a frequent/infrequent itemset. Integrity Verification of Outsourced Frequent Itemset Mining with Deterministic Guarantee. ICDM 13. Dong, Liu, Wang 5 / 25

  6. Set Intersection Verification Protocol Given a collection sets S = { S 1 , . . . , S m } , an intersection result Y = { y 1 , . . . , y δ } , Y = S 1 ∩ S 2 ∩ · · · ∩ S m is the correct intersection of S if and only if: • ( Y ⊆ S 1 ) ∧ · · · ∧ ( Y ⊆ S m ) (subset condition); • ( S 1 − Y ) ∩ · · · ∩ ( S m − Y ) = ∅ (completeness condition). 6 / 25

  7. Set Intersection Verification Protocol Given a collection sets S = { S 1 , . . . , S m } , an intersection result Y = { y 1 , . . . , y δ } , Y = S 1 ∩ S 2 ∩ · · · ∩ S m is the correct intersection of S if and only if: • ( Y ⊆ S 1 ) ∧ · · · ∧ ( Y ⊆ S m ) (subset condition); • ( S 1 − Y ) ∩ · · · ∩ ( S m − Y ) = ∅ (completeness condition). [PTT11] server prepares Π( Y ) = {B , A , W , C} client checks coefficients B = { b δ , b δ − 1 , · · · , b 0 } of B = { b 0 , . . . , b δ } polynomial ( s + y 1 )( s + y 2 ) · · · ( s + y δ ) are correct. accumulation values A = { acc ( S j ) |∀ S j ∈ S} A are correct � x ∈ Sj ( s + x ) where acc ( S j ) = g e ( � | Y | k = 0 ( g s k ) b k , W j ) subset witness W = { W j |∀ S j ∈ S} ? where W j = g P j ( s ) , = e ( acc ( S j ) , g ) P j ( s ) = � x ∈ S j − Y ( x + s ) for j = 1 , · · · , m � m completeness witness C = { C j |∀ S j ∈ S} j = 1 e ( W j , C j ) ? for each set S j ∈ S , C j = g q j ( s ) = e ( g , g ) s.t. q 1 ( s ) P 1 ( s ) + q 2 ( s ) P 2 ( s ) + · · · + q m ( s ) P m ( s ) = 1 7 / 25

  8. Basic Solution Given a dataset D that contains n unique items, the client does the following: Integrity Verification of Outsourced Frequent Itemset Mining with Deterministic Guarantee. ICDM 13. Dong, Liu, Wang 8 / 25

  9. Basic Solution Given a dataset D that contains n unique items, the client does the following: 1 Build the item-based inverted index E I that consists of n inverted lists { L 1 , . . . , L n } . 2 Construct the Merkle hash tree T of the inverted index. • Leaf l j is assigned h j = hash ( acc ( L j ) ( s + j ) ) . • Internal node v with children c 1 , . . . , c k is assigned h v = hash ( h c 1 || . . . || h c k ) . Integrity Verification of Outsourced Frequent Itemset Mining with Deterministic Guarantee. ICDM 13. Dong, Liu, Wang 9 / 25

  10. Basic Solution Given a dataset D that contains n unique items, the client does the following: 1 Build the item-based inverted index E I that consists of n inverted lists { L 1 , . . . , L n } . 2 Construct the Merkle hash tree T of the inverted index. • Leaf l j is assigned h j = hash ( acc ( L j ) ( s + j ) ) . • Internal node v with children c 1 , . . . , c k is assigned h v = hash ( h c 1 || . . . || h c k ) . Mapping to the set intersection verification problem Verifying whether any itemset I is included in a set of transactions T I is equivalent to verifying whether T I is the correct intersection of the inverted lists of all items in I . Integrity Verification of Outsourced Frequent Itemset Mining with Deterministic Guarantee. ICDM 13. Dong, Liu, Wang 10 / 25

  11. Basic Solution Drawbacks • Total number of proofs is 2 n − 1. • Too much overhead. 11 / 25

  12. Verification Optimization Maximal frequent itemset (MFI) A subset of F S s.t. for each itemset I ∈ MFI , there does not exist any itemset I ′ ∈ F S s.t. I ⊆ I ′ . Minimal infrequent itemset (MII) A set of itemsets that do not appear in F S s.t. for each itemset I ∈ MII , there does not exist any itemset I ′ �∈ F S s.t. I ′ ⊆ I . (Itemsets in dotted rectangles are maximal frequent itemsets.) Advantage | MFI | + | MII | ≪ | F S | + | F S | 12 / 25

  13. Optimized Solution Security Analysis Our optimized solution provides the same security guarantee as the basic solution. Integrity Verification of Outsourced Frequent Itemset Mining with Deterministic Guarantee. ICDM 13. Dong, Liu, Wang 13 / 25

  14. Complexity Proof construction at server side O ( Mlog 3 M + n ǫ logn ) • M = � � i ∈ I | L i | I ∈ MFI ∪ MII • n is the number of unique items of D . • ǫ ∈ ( 0 , 1 ) Verification at client side O ( N + F ) • N = � I ∈ MFI ∪ MII | I | • F = � I ∈ MFI ∪ MII sup ( I ) Integrity Verification of Outsourced Frequent Itemset Mining with Deterministic Guarantee. ICDM 13. Dong, Liu, Wang 14 / 25

  15. Experiments • Environment Language C++ Testbed Macbook Pro, 2 . 4 GHz CPU, 4 GB memory • Dataset # of # of Avg. trans. min sup # of freq. trans. items length itemsets 10 3 S 1 49 10 250 36 10 4 S 2 49 10 250 3854 10 5 S 3 49 10 250 149744 10 6 S 4 49 10 250 3074610 R 500 100 2.4 5 97 • Simulation of malicious actions Error ratio r = 1 % , 2 % , 5 % , 10 % , 20 % Incomplete Randomly delete r percent mining result. Incorrect Randomly insert r percent infrequent itemsets. 15 / 25

  16. Proof Optimization Ratio & Verification Time Optimization Ratio & Verification Time ( R dataset) 1 0.45 Completeness Verification Verification Time (Seconds) 0.4 Correctness Verification Optimization Ratio (%) 0.8 0.35 0.3 0.6 0.25 0.2 0.4 0.15 0.1 0.2 Completeness Verification 0.05 Correctness Verification 0 0 1% 2% 5% 10% 20% 1% 2% 5% 10% 20% Error Ratio (%) Error Ratio (%) (a) Proof optimization ratio (b) Client verification time Integrity Verification of Outsourced Frequent Itemset Mining with Deterministic Guarantee. ICDM 13. Dong, Liu, Wang 16 / 25

  17. Scalability Scalability (error ratio=1%) Verification Time (Seconds) 40 0.008 35 0.007 Time (Seconds) 30 0.006 25 0.005 20 0.004 15 0.003 10 0.002 5 0.001 0 0 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 Dataset Size Dataset Size (a) Construction time of one proof (itemset length = 3) (b) Client verification time Integrity Verification of Outsourced Frequent Itemset Mining with Deterministic Guarantee. ICDM 13. Dong, Liu, Wang 17 / 25

  18. References I [Bab85] László Babai. Trading group theory for randomness. In Proceedings of the seventeenth annual ACM symposium on Theory of computing , pages 421–429. ACM, 1985. [DLW13] Boxiang Dong, Ruilin Liu, and Hui Wendy Wang. Result integrity verification of outsourced frequent itemset mining. In Data and Applications Security and Privacy XXVII , pages 258–265. Springer, 2013. [GGP10] Rosario Gennaro, Craig Gentry, and Bryan Parno. Non-interactive verifiable computing: Outsourcing computation to untrusted workers. In Advances in Cryptology–CRYPTO 2010 , pages 465–482. Springer, 2010. [GMR89] Shafi Goldwasser, Silvio Micali, and Charles Rackoff. The knowledge complexity of interactive proof systems. SIAM Journal on computing , 18(1):186–208, 1989. [LWM + 12] Ruilin Liu, Hui Wendy Wang, Anna Monreale, Dino Pedreschi, Fosca Giannotti, and Wenge Guo. Audio: an integrity auditing framework of outlier-mining-as-a-service systems. In Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases-Volume Part II , pages 1–18. Springer-Verlag, 2012. [PJRT05] HweeHwa Pang, Arpit Jain, Krithi Ramamritham, and Kian-Lee Tan. Verifying completeness of relational query results in data publishing. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data , pages 407–418. ACM, 2005. 18 / 25

Recommend


More recommend