understanding data lifetime
play

Understanding Data Lifetime Presented by: William Enck CSE544: - PowerPoint PPT Presentation

Understanding Data Lifetime Presented by: William Enck CSE544: Spring 2007 Based on Understanding Data Lifetime via Whole System Simulation by Chow et al. CSE544: Spring 2007 Page 1 Why Look at Data Lifetime? Have you ever


  1. Understanding Data Lifetime Presented by: William Enck CSE544: Spring 2007 Based on “Understanding Data Lifetime via Whole System Simulation” by Chow et al. CSE544: Spring 2007 Page 1

  2. Why Look at Data Lifetime? • Have you ever programmed in Java? ‣ Which do you prefer to use? • String • char[] ‣ What about for password input? CSE544: Spring 2007 Page 2

  3. The Circle of Life • Who allocates memory? ‣ Kernel (device drivers, etc) ‣ System Apps (login, su, etc) ‣ User Apps (Firefox, ssh, etc) • Where does it end up? ‣ Kernel ‣ Any application ‣ System swap ‣ Hibernation storage CSE544: Spring 2007 Page 3

  4. Minimizing Lifetime is Hard • Course-grained propagation control ‣ How can you determine what gets swapped out? • Encrypt swap if you can (OS X.4 security option) • Applications can use mmap and mlock to pin memory ‣ Program failure (core dump written to a file!) • scrash (secure crash reporting) • Fine-grained control is better (e.g., memset ) ‣ Relies on smart programmers ‣ Not always possible • Compiler optimizations (removes memset !) • Language optimizations CSE544: Spring 2007 Page 4

  5. Whole System Simulation Approach • General Idea: ‣ Run all code in a virtual machine and watch what happens ‣ Sensitive information is marked (“tainted”) and followed • What is Taint Analysis ? ‣ Typically refers to tracing low integrity input ‣ Here, tainting is uses as a way to mark data of interest and follow propagation CSE544: Spring 2007 Page 5

  6. Tradeoffs • What are the tradeoffs of performing analysis at the binary level? Pros Cons No need to worry about Difficult to determine what compiler optimizations is sensitive Does not need source code Loss of code semantics - Info Flow analysis (JiF) Crosses application boundaries Remember, memory pages pass between applications as well as the kernel CSE544: Spring 2007 Page 6

  7. Marking Sensitive Data • Automate as much as possible ‣ Device drivers, e.g., NIC, keyboard ‣ Difficult to generalize • Add to specific places in applications ‣ Manually change source code or binary image ‣ Difficult to find locations CSE544: Spring 2007 Page 7

  8. Recreating Semantics • Taint analysis must follow implicit and explicit flows ‣ Variable assignments/arithmetic ‣ Conditionals ‣ Lookup Tables ‣ One-way functions • Is data still sensitive after a cryptographic hash? • At the assembly level, some code may appear to have flows, but really do not ‣ Constant functions CSE544: Spring 2007 Page 8

  9. Data Mining • What do we do with all this information? ‣ Who has tainted data? ‣ How did they get it? ‣ When did that happen? • Log traces capture flow of tainted memory • Common places for tainted memory ‣ Kernel memory (RNG, I/O buffers) ‣ Application I/O buffers ‣ “String” datatype CSE544: Spring 2007 Page 9

  10. Take-away • Applications, libraries, and the kernel do not consider data lifetime. • Secure deallocation methods are needed. CSE544: Spring 2007 Page 10

Recommend


More recommend