Kerim Y. Oktay, Vaibhav Khadilkar, Bijit Hore, Murat Kantarcioglu, - PowerPoint PPT Presentation

Kerim Y. Oktay, Vaibhav Khadilkar, Bijit Hore, Murat Kantarcioglu, Sharad Mehrotra, Bhavani Thuraisingham 1

Cloud Computing App Code Server Cloud Email Computing Database Multimedia  Like Software as a service and DAS model offers many advantages  Better availability  Reduced Costs  Unlimited scalability and elasticity 2

Hybrid Cloud  Integrates local infrastructure with public cloud resources Private/ Public/ Internal External Hybrid Cloud  Extra Advantages  The flexibility of shifting workload to public cloud when the private cloud is overwhelmed (Cloud Bursting)  Utilizing in-house resources along with public resources  Cons  Sensitive data exposure  Public Cloud Resource Allocation Cost (both storage and computing) 3

Data & Computation Partitioning Challenge Sensitive Student Q1 : SELECT name, ssn from Student s_id name ssn dept 1 James 1234 CS Q2 : SELECT dept, count(*) FROM Student 2 Charlie 4321 EE GROUP_BY dept 3 John 5645 CS How to split computation? 4 Matt 8743 ECON How to partition the table ? Constraints • Q1 contains sensitive information • Q2 execution is more expensive 4

Our Hybrid Cloud Architecture Queries Q Constraints C Relations R Results for Q pub Results for Q priv User Interface Layer Statistics Gathering Layer Data and Query Management Layer R pub, Q pub R , Q priv Hive Hive Hadoop HDFS Hadoop HDFS Private Public 5

Design Spectrum  Data Model  Relational, Semi-structured, Key-Value Stores, Text  Sensitivity Model  Attribute Level, Privacy Associations, View-Based  Partitioning Models  Workload Partitioning, Intra-query Parallelism, Dynamic Workload  Minimization Priority  Running Time, Sensitive Data Disclosure, Monetary Cost 6

Outline of Solution  Notation  Formulate Computation Partition Problem (CPP)  Solution to CPP  Experimental Results 7

Notation  sens (R’) : The estimated number of sensitive cells in dataset R’  baseTables(q): The estimated minimum set of data items necessary to answer query q Є Q  runT x (q): The estimated running time of query q Є Q at site x (either public or private)  ORunT (Q’,Q’’) : Overall execution time of queries in Q’, given that queries in Q’’ are executed on the public cloud   freq ( q ) x runT ( q ) pub    q Q ' '  ORunT ( Q ' , Q ' ' ) max   freq ( q ) x runT ( q ) priv    q Q ' Q ' ' 8

Detailed Hybrid Cloud Architecture Queries Q Constraints C Relations R SR Statistics Gathering Layer runT x (q), baseTables(q) Data And Query Management Layer Monetary Cost Estimator Computation Partitioning Module Disclosure Risk Estimator R pub, Q pub R , Q priv Hive Hive Hadoop HDFS Hadoop HDFS Public Private 9

Computation Partitioning Problem (CPP)  Find a subset of given query workload , Q pub  Q and subset of the given dataset where R pub  R minimize ORunT ( Q , Q ) pub    subject to ( 1 ) store ( R ) freq ( q ) x proc ( q ) MC pub  q Q pub  ( 2 ) sens ( R ) DC pub    ( 3 ) q Q baseTables ( q ) R pub pub  , are user defined constraints MC DC 10

Metrics in CPP  Query Execution Time ( runT x (q) )     inpSize ( ) outSize ( )     operator q runT (q) x w x  Monetary Costs  stor(R pub ) : Storage monetary cost of the public cloud partition  proc(q) : Processing monetary cost of a public side query q  Sensitive Data Disclosure Risk ( sens(R pub ) )  Estimated number of sensitive cells within R pub 11

Solution to CPP  CPP can be simplified to only finding Q pub  Dynamic Programming Approach Output  CPP (Q, MC, DC) = Qpub Input Query Set Monetary Const. Disclosure Const. 12

Example    Q q , q , q 1 2 3 q 3 can only run on private side.  If MC < 25 or DC < 20  CPP({ q 1 , q 2 , q 3 }, MC, DC) = CPP({ q 1 , q 2 }, MC , DC) 13

Example    Q q , q , q 1 2 3 What if q 3  If q 3 can run on both sides runs on private side.  Case 1  CPP({ q 1 , q 2 , q 3 }, MC, DC) = CPP({ q 1 , q 2 }, MC , DC) 14

Example    Q q , q , q 1 2 3 2    Q q , 1 q 2 What if q 3 runs on  Case 2 public side. 2 Q  CPP(Q, MC, DC) = MIN_TIME (CPP( , j, k)+ q 3 ) where MC- 25 ≤ j ≤ MC -15 and DC- 20 ≤ j ≤ DC -0 Max-Min possible Max-Min possible monetary cost by q 3 disclosure risk by q 3  Choose the minimum overall running time between Case 1 and Case 2 15

Experimental Setting  Experimental Setting  Private Cloud: 14 Nodes, located at UTD, Pentium IV, 4 GB Ram, 290-320 GB disk space  Public Cloud: 38 Nodes, located at UCI, AMD Dual Core, 8GB Ram, 631 GB disk space  Hadoop 0.20.2 and Hive 0.7.1  Dataset and Statistic Collection  100 GB TPC-H Data  Query Workload  40 queries containing modified versions of Q1, Q3, Q6, Q11 17

Experimental Setting  Estimation of Weight (w x )  Running all 22 TPC-H queries for a 300 GB dataset  w pub ≈ 40MB/sec , w priv ≈ 8MB/sec  Resource Allocation Cost  Amazon S3 Pricing for storage and communication  Storage = $0.140/GB + PUT, Communication= $0.120/GB + GET  PUT=$0.01/1000 request, GET=$0.01/10000 request  Amazon EC2 and EMR Pricing for processing  $0.085 + $0.015 = $0.1/hour  Sensitivity  Customer : c_name, c_phone, c_address attributes  Lineitem: All attributes in %1-5-10 of tuples 18

Experimental Results 19

Experimental Results 20

Future Work  Extend work to enable intra-query parallelism  Support Dynamically Changing (or arriving) Workload  Extend this work to other cloud computing technologies  Support Different Sensitivity Models 21

Kerim Y. Oktay, Vaibhav Khadilkar, Bijit Hore, Murat Kantarcioglu, - PowerPoint PPT Presentation

Kerim Y. Oktay, Vaibhav Khadilkar, Bijit Hore, Murat Kantarcioglu, Sharad Mehrotra, Bhavani Thuraisingham 1 Cloud Computing App Code Server Cloud Email Computing Database Multimedia Like Software as a service and DAS model offers many

- Sunitha Ramanujam, Vaibhav Khadilkar, Latifur Khan, Steven Seida, Murat Kantarcioglu, Bhavani

MRMap and SARLOC Mobile phone Geolocation for Search and Rescue Russell Hore 1 1 18 Greenway

Cancer Screening in Turkey Murat Gultekin, MD, Assoc. Prof. Hacettepe University, Gyn Oncol

Calculation of Generalized Pauli Constraints Murat Altunbulak Department of Mathematics Dokuz

The Tata Power Company Ltd. Mumbai Distribution Shekhar Khadilkar Assistant General Manager DSM

Searches for third generation squarks with ATLAS Kerim Suruliz (University of Sussex) on behalf

BIL 722: Advanced Topics in Computer Vision Mehmet Kerim Y cel Deep Structured Models For

Measuring TCP Connection Establishment Times of Dual-Stacked Web Services [1] Vaibhav Bajpai

HARZEMLI, The DDI Based Statistical Production Platform Murat TUNEL Head of Information

Optimal and Adaptive Filtering Murat ney M.Uney@ed.ac.uk Institute for Digital Communications

Optimization of Video Serv rvices by SDN-Assisted Edge Computing A. Murat Tekalp Department of

Optimal and Adaptive Filtering Murat ney M.Uney@ed.ac.uk Institute for Digital Communications

Multi-Element Optical Wireless Modules for Mobile Networking and Lighting Murat Yuksel

IN DUMLUPINAR UNIVERSITY Assoc . Prof. Dr. Oktay AHBAZ Lecturer Bahar ELK Expert Berrin

AXIAL SPONDYLARTHROPATHIES Dr. AYSENUR OKTAY Med School Ege Univ, Radiology Izmir, TR Axial

Multi-Input Cardiac Image Super-Resolution using Convolutional Neural Networks Ozan Oktay, Wenjia

Coastal Community Access to Marine Resources and Conservation in Canada Nathan J. Bennett, PhD

January / February 2013 MADALENA : MVN (TSX-V) Corporate Overview Market capitalization

FORWARD-LOOKING INFORMATION ADVISORY Forward-Looking Statements or Information Certain statements

Second Quarter Trading Update 16 July 2013 PageGroup Second Quarter 2013 Trading Update | 2

Towards Supercloud Computing: User-Centric Security Management for Clouds of Clouds Marc Lacoste

Lessons Learned Automating Cloud and Infrastructure Testing

Investor Presentation September 9, 2020 Disclaimer Forward rd-Loo Looking State temen ments

For personal use only 2015 Annual General Meeting Perth Greg Cochran Managing Director ASX: