random sampling applied to rapid disk analysis
play

Random Sampling applied to Rapid Disk Analysis System & Network - PowerPoint PPT Presentation

Rapid Disk Analysis The Math The Aftermath Conclusions Random Sampling applied to Rapid Disk Analysis System & Network Engineering Research Project Nicolas Canceill July 4, 2013 1/28 Rapid Disk Analysis The Math The Aftermath


  1. Rapid Disk Analysis The Math The Aftermath Conclusions Random Sampling applied to Rapid Disk Analysis System & Network Engineering — Research Project Nicolas Canceill July 4, 2013 1/28

  2. Rapid Disk Analysis The Math The Aftermath Conclusions Rapid Disk Analysis 1 The Math 2 The Aftermath 3 Conclusions 4 2/28

  3. Rapid Disk Analysis The Math The Aftermath Conclusions Introduction Background Assoc. Prof. S. Garfinkel — Navy Postgraduate School Advanced Forensics Format The Sleuth Kit Better analysis for digital evidence “Searching a 1TB hard drive in 10 minutes” (ACM 2013) Research E. van Eijk, Z. Geradts — Nederlands Forensisch Instituut Stability? Scalability? Precision? 2/28

  4. Rapid Disk Analysis The Math The Aftermath Conclusions Rapid Disk Analysis 1 The Math 2 The Aftermath 3 Conclusions 4 3/28

  5. Rapid Disk Analysis The Math The Aftermath Conclusions Rapid Analysis: Why? Traditionally: investigation was “leisurely” Reading a 1TB hard drive: about 3.5h The cost of “seek”: 1 × 36GB ≈ 100 , 000 × 64KiB New challenges Large installations: computers room, datacenter. . . Forensics control at checkpoints: border crossing, airports. . . “The bomb will go off in the next hour!” 4/28

  6. Rapid Disk Analysis The Math The Aftermath Conclusions Rapid Analysis: What for? Profit Indications Data analysis Determine free/wiped space Characterize data based on signatures Hash sectors to look for specific data 5/28

  7. Rapid Disk Analysis The Math The Aftermath Conclusions Rapid Analysis: How? Data characteristics Described (header/trailer) Encoded/formatted Sectorized and distributed Analysis strategies Simplify: hashing Tolerate: extract signature Reduce: random sampling 6/28

  8. Rapid Disk Analysis The Math The Aftermath Conclusions Research scope Research question How can random sampling help forensically investigate hard disk drives? What kind of indications may be provided? Which parameters are in play? Which degree of certainty may be achieved? 7/28

  9. Rapid Disk Analysis The Math The Aftermath Conclusions Rapid Disk Analysis 1 The Math 2 The Aftermath 3 Conclusions 4 8/28

  10. Rapid Disk Analysis The Math The Aftermath Conclusions Analysis process Built on top of S. Garfinkel’s frag_find tool Input Image file to search Data-set/Signatures-set to look for Parameters: hashing, sampling , tolerance Process Build Bloom filter (hashing) Select sample For each block in sample : filter (and compare) 9/28

  11. Rapid Disk Analysis The Math The Aftermath Conclusions Random sampling: Basic model Using a random sample of a statistical population to estimate/predict characteristics Simple scenario “Is this hard drive empty/wiped?” M empty blocks out of N n sampled blocks out of N Error rate The probability to sample only empty blocks: i = n N − ( i − 1 ) − M � E = N − ( i − 1 ) i = 1 10/28

  12. Rapid Disk Analysis The Math The Aftermath Conclusions Random sampling: Data layout Data is sectorized: Data is not always aligned: 11/28

  13. Rapid Disk Analysis The Math The Aftermath Conclusions Random sampling: Advanced model A more realistic scenario “Does this hard drive contain the target block?” All possible offsets: overlap transactions by B − F � � C All possible transactions: N = T − ( B − F ) � D � All target transactions: M = T Error rate The probability to miss all target blocks: � D � � C � − ( i − 1 ) − i = n T − ( B − F ) T � E = � � C − ( i − 1 ) i = 1 T − ( B − F ) 12/28

  14. Rapid Disk Analysis The Math The Aftermath Conclusions Experimental protocol Experimental image set Parameters : image size, sector size, % of empty sectors, length of target data, offset size Input : Random files and NSRL Reference DataSet Experimental process Parameters : image size, sector size, transaction size, sampling fraction Randomly select a master file signature Generate several images (length of target data, % of empty sectors) Successively run several timed searches 13/28

  15. Rapid Disk Analysis The Math The Aftermath Conclusions Rapid Disk Analysis 1 The Math 2 The Aftermath 3 Conclusions 4 14/28

  16. Rapid Disk Analysis The Math The Aftermath Conclusions Results: statistical distribution 0 . 6 0 . 5 Presence of target data 0 . 4 0 . 3 0 . 2 0 . 1 0 10 0 10 1 10 2 10 3 10 4 Nb. of transactions 15/28

  17. Rapid Disk Analysis The Math The Aftermath Conclusions Results: block-to-transaction scaling 1 Transaction size 2 blocks 0 . 8 4 blocks Avg. error variance 8 blocks 0 . 6 0 . 4 0 . 2 0 10 0 10 1 10 2 10 3 10 4 Sample size (blocks) 16/28

  18. Rapid Disk Analysis The Math The Aftermath Conclusions Results: precision scaling 0 . 14 Image size 0 . 12 2MB 4MB Avg. error variance 0 . 1 10MB 20MB 8 · 10 − 2 6 · 10 − 2 4 · 10 − 2 2 · 10 − 2 0 10 0 10 1 10 2 10 3 10 4 Nb. of transactions 17/28

  19. Rapid Disk Analysis The Math The Aftermath Conclusions Results: time scaling Image size 10 − 1 Avg. search time (seconds) 200kB 400kB 1MB 2MB 10 − 2 4MB 10MB 20MB 10 − 3 40MB 100MB 10 0 10 1 10 2 10 3 10 4 10 5 Nb. of sampled blocks 18/28

  20. Rapid Disk Analysis The Math The Aftermath Conclusions Results: time overhead Avg. search time (seconds) 10 − 3 . 2 10 − 3 . 4 Image size 2MB 10 − 3 . 6 4MB 10MB 20MB 10 − 3 . 8 10 0 10 1 10 2 Nb. of transactions 19/28

  21. Rapid Disk Analysis The Math The Aftermath Conclusions Rapid Disk Analysis 1 The Math 2 The Aftermath 3 Conclusions 4 20/28

  22. Rapid Disk Analysis The Math The Aftermath Conclusions Contributions Main findings Parameters analyzed: Image characteristics: image size, sector size, data alignment, size of target data Sampling settings: sample size, transaction size, tolerance Scalability: Sample size scales with time: S ∼ t 1 Error rate scales with time: E ∼ √ t Public material Fork of S. Garfinkel’s tools on GitHub Most of experimental scripts on Gist 21/28

  23. Rapid Disk Analysis The Math The Aftermath Conclusions Research answers What kind of indications may be provided? Presence/absence of target data or signature Which parameters are in play? Disk and data characteristics Sampling parameters Which degree of certainty may be achieved? Certainty scales well with time Insight about target disk will improve certainty Random sampling is a powerful, scalable, adaptive technique for fast HDD analysis Efficiency relies on suitable sampling settings, and limited insight on target HDD 22/28

  24. Rapid Disk Analysis The Math The Aftermath Conclusions Further research Improving insight of target Pre-determine sector size, data alignment Look for optimal block-to-transaction ratio One step further: pre-sampling Automate decision process Optimal time spending Automatic settings balance Simple user-side: time or certainty 23/28

  25. Rapid Disk Analysis The Math The Aftermath Conclusions Appendix 1: Bloom Filter (a) Hash-based filtering technique Initialize An array of n bits set to zero k different hash functions uniformly mapping to [ 0 − n ] Add an element Apply functions to compute k integers in [ 0 − n ] Set k corresponding bits to 1 Query an element Apply functions to compute k integers in [ 0 − n ] Check if k corresponding bits are all 1 24/28

  26. Rapid Disk Analysis The Math The Aftermath Conclusions Appendix 1: Bloom Filter (b) · 10 − 2 6 Bloom filter size 8 bits 5 32bits Avg. error variance 4 3 2 1 0 10 0 10 1 10 2 10 3 10 4 Nb. of transactions 25/28

  27. Rapid Disk Analysis The Math The Aftermath Conclusions Appendix 1: Bloom Filter (c) Avg. building and search time (seconds) 1 . 2 1 0 . 8 Bloom filter size 8 bits 0 . 6 16 bits 24 bits 0 . 4 30 bits 31 bits 0 . 2 32 bits 0 10 0 10 1 10 2 10 3 10 4 Nb. of transactions 26/28

  28. Rapid Disk Analysis The Math The Aftermath Conclusions Appendix 2: Data layout (a) Optimal transaction size depends on sector size Best case: Worst case: 27/28

  29. Rapid Disk Analysis The Math The Aftermath Conclusions Appendix 2: Data layout (b) Optimal transaction size depends on data layout 28/28

Recommend


More recommend