Outline Get Off of My Cloud 8271 discussion of cloud computing security (combined) Administrative discussions Stephen McCamant University of Minnesota Multi-Cloud Oblivious Storage Old and new topics in security Cloud threats, old and new Paper type 1: new idea, never been done before Old: your system’s regular vulnerabilities Main contribution is novelty New but understood: need to trust Incentive to be first, maybe even a race cloud provider Paper type 2: improvement in an Focus here: attacks from cloud already-busy area neighbors Contributions judged differentially Incentive to optimize Case study: Amazon EC2 Ethical/legal sidebar Largest, highest-profile infrastructure Important for academic researchers to cloud provider do things “by the book” World-spanning data centers, instance Ethical obligations may be greater or sizes $0.02-$6.82 per hour less than legal ones Many instance types use Xen to Here: CFAA, EC2 user agreement multiplex one physical machine
Placement and extraction Network probing Placement : get an instance on the TCP traceroutes, port 80 and 443 same physical machine as the victim scans, DNS resolution Extraction : given placement, get Instances have one name, but separate confidential info public and internal IP addresses Network mapping Network-based co-residence checks Dom0 in traceroute (easiest) Internal addresses reflect topology Close IP addresses Disjoint by availability region, clustered Smallest packet round-trip times by instance type All found to have “effectively zero” false Dom0s in an adjacent block positives Hard disk usage channel Covert channels and side channels “Covert channel”: generally send and Measure contention for hard disk (e.g., receiver cooperate seek times) between VMs One classification: storage channels, timing channels “No attempt to optimize” bandwidth: “Side channel”: “sender” is passive 0.0005 bits/sec (33 mins per bit) victim Why so slow? Can again include timing, also error messages, power usage, etc.
Observed placement locality Evaluating brute-force placement Chose 1686 victims Sequential locality: new instance likely Small instances in zone 3 with public web to use same machine as old dead one servers Parallel locality: instances started close Launched probe instances and checked in time more likely to share co-residence Non-locality: one account never given 510 probes: hit 127 victims 1785 probes: hit 141 victims, 8.4% two instances on same machine Using locality Cache: Prime ✰ Trigger ✰ Probe Idea: use parallel locality, try to start 1. (Prime) Fill cache with my data probes soon after victim 2. Busy loop until preempted (recognize Perhaps can trigger victim start, such as if with TSC) it’s based on demand 3. Measure time to re-read my data About 40% coverage for 20 victims Must play tricks to defeat CPU and 20 probes pre-fetch Also demonstrated against demos of Differential coding to resist noise commercial services Load and traffic estimation Keystroke timing attack (classic) Fine-grained keystroke timing can reveal information about text typed Check for co-residence using system load as a covert channel Especially given per-user training Estimate traffic load on co-resident web Demonstrated in lab against passwords server typed over SSH, without breaking crypto 50 ✂ speedup over exhaustive search
Keystrokes in Xen Countermeasures: limited Lab installation with CPU pinning, Randomize and isolate network otherwise idle; not real EC2 structure Threshold cache activity level Timing measurements still possible More than idle, less than otherwise busy Block or add noise to covert channels 5% false negatives, 0.3 false positives Hard, and how to know you have them all? per second Avoid locality in placement algorithm Reduces but does not eliminate attacks Timing resolution 13ms, enough for prior attacks Countermeasure: pay for isolation Outline Pay extra to have machines all to Get Off of My Cloud yourself Argument: fair cost upper-bounded by Administrative discussions cost of one physical machine Not implemented Multi-Cloud Oblivious Storage Though compare: GovCloud Next week: Bitcoin Choosing presentation topics I still need to post more papers For Monday: double-spending attacks Is volunteering viable? For Wednesday: real anonymity with Zerocoin Possible alternative: lottery plus trading
Choosing project topics Outline Get Off of My Cloud Start looking for groups and topics now Meet with me next week or week after Administrative discussions Proposals due February 28th (less than one month) Multi-Cloud Oblivious Storage Motivation: hide access patterns What’s revealed by plain encryption? Information is leaked by what you Imagine we encrypt every disk block access when with function ❊ Consider encrypted email, medical info, Adversary can still see patterns of etc. locations Goal here: conceal location, read vs. If ❜ ✶ ❂ ❜ ✷ , ❊ ✭ ❜ ✶ ✮ ❂ ❊ ✭ ❜ ✷ ✮ write Using probabilistic encryption Straw man 1: access every block Probabilistic encryption: randomized, For each virtual access (read or write), returns different ciphertext each time access (read and write) every physical Standard in public key, theory, and with modes of operation block To conceal read vs. write, always Secure, but impractical replace block with new encryption
Straw man 2: shuffle all blocks Goldreich square-root construction Use pseudo-random permutation to First semi-practical idea (STOC 1987) Cache of ♣ ♠ locations accessed each shuffle all block locations Secure if you never access a block time, plus shuffled copy more than once Dummy accesses for consistency But leaks on any repeated operations Reshuffle after ♣ ♠ operations Can’t have, e.g., read after write G&O hierarchical idea The client bandwidth constraint Split into levels of exponentially In many storage outsourcing increasing size applications, major constraint is client’s Write back in smallest level, then network bandwidth reshuffle into larger Client has significant local storage Various kinds of hashing can be used Not enough for all data Polylog amortized cost for ❖ ✭ ✶ ✮ client But enough for an index (order of one word per block) storage But still pretty impractical Multi-cloud approach Threat models in protocols Cloud-to-cloud bandwidth more than (Fully) honest: follows the protocol client-to-cloud exactly Use multiple (e.g. 2) clouds Malicious: can do anything (worst case) Require: not all clouds are malicious Semi-honest, AKA honest-but-curious: follows protocol, but may try to learn Major savings, especially on client secrets from seen data bandwidth
SSS partitioning Splitting between clouds Divide data into ♣ ♠ partitions of size ♣ ♠ Make expensive operations Client keeps location index and ♣ ♠ cloud-to-cloud Do operation in one cloud to hide from blocks of cache the other Improves worst-case and constant “Non-colluding” confidentiality assumption factors, but still needs log ♣ ♠ (e.g., 10 ✂ ) accesses to read Write: oblivious shuffling Read: oblivious selection Figure source: taken from the paper Figure source: taken from the paper Homomorphic checksum Experimental deployment Amazon EC2 (AWS) with SSDs Linear checksum allows computation on encrypted blocks Microsoft Azure, lacking SSDs Note: not secure after an adversary has Up to 5 servers (max out client seen examples! bandwidth) Combined with PRF (imagine: MAC) and $3.10 per hour plus $2.50 per GB for authenticated encryption one server
Bottleneck analysis Client bandwidth 2.6 ✂ Compare 2 ✂ for read and write In practice: Azure’s non-SSD disk speeds Assuming SSDs, double throughput up to 6MB/s Based on cloud-to-cloud bandwidth bottleneck, 30-60MB/s
Recommend
More recommend