How risky is the Cyber Independent Testing Lab software you use? { Sarah Zatko , Tim Carstens , Patrick Stach , Parker Thompson , mudge } @ CITL https://shmoo18.cyber-itl.org
• A non-profit organization based in USA • Founded by Sarah Zatko & mudge • Mission: to improve the state of software security by providing the public with accurate We are CITL reporting on the security of popular software • Funding from the Ford Foundation • Partners with Consumer Reports https://www.consumerreports.org & The Digital Standard https://thedigitalstandard.org
Something like this, but for software security.
How do you do this for software security?
Scores & Hardened Gentoo Samsung UN55KS9000 Histograms Ubuntu 16 LTS LG 55UH8500
Visio LG Samsung Ubuntu P55-E1 16.04 49UJ7700 UN55KS9000 Security Today: # binaries 504 1740 4243 4991 aslr 98% 67% 80% 100% stack DEP 99% 99%* 99%* 99% You can lead the 64 bit 0% 0% 0% 98% pack by mastering RELRO 100% 4% 9% 96% stack guards 68% 1% 57% 79% the fundamentals. fully fortified 7% 0% 6% 11% partial fort 43% 1% 37% 42% has good 3% 3% 25% 4% has risky 68% 66% 67% 67% has bad 28% 34% 23% 28% has ick 3% 5% 5% 3%
1. Remain independent of vendor influence 2. Automated, comparable, quantitative analysis 3. Act as a user watchdog Our goals • Non-goal: find and disclose vulnerabilities • Non-goal: tell software vendors what to do • Non-goal: perform free security testing for vendors
1. What works? Three big 2. How do you recognize when it’s being done? questions 3. Who’s doing it?
The basic idea
• Given a piece of software, we can ask 1. Overall, how secure is it? 2. What are all of its vulnerabilities? Information Theory • (1) appears to ask for less-info than (2) Perspective • Our Question: Develop an heuristic which can efficiently answer (1) but not necessarily (2)
Step One: Static Measurements • Complexity • Functions called • Safety features Years in the field give us a good starting point – look for the same things we’d look at when trying to pick a soft target to exploit. But, this field doesn’t know enough about impact/effectiveness of best practices.
Early Promise Browser “Underground” Exploit Price Microsoft Edge $80,000 Google Chrome $80,000 Apple Safari $50,000 Mozilla Firefox $30,000
Step 2: Fuzzing! Lots of it. • Fuzzing provides a testable, recognized way to roughly measure software’s “security” • The more robust software is when fuzzed, the less likely it is to be exploitable • If we could fuzz everything, we wouldn't’ t even necessarily need the heuristics • But we can’t, so
Step 3: Profit! Bayes! (1/3) • For some software s, we know that we can’t compute P(s is secure ) • As a surrogate, we can compute probabilities of different fuzzing outcomes, like: P h,k = P( h units of fuzzing against s yields < k unique crashes )
Step 3: Profit! Bayes!(2/3) • Fuzzing is expensive, so we “go Bayesian” • Let M be an observable property of software • Examples: is compatible with RELRO, has “low complexity,” etc • For random s in S , consider the conditional probabilities P h,k (M) = P( h fuzzing on s yields < k unique crashes | M is true of s ) • What we want: Which M have P h,k (M) > 0.5 for large log(h) / k ? Which indicators (M) can be used to predict fuzzing performance?
Step 3: Profit! Bayes! (3/3) Indicators might not be causal, and that’s OK: • It could be that M ’s presence literally prevents crashes • But it could also be that M is mostly only found in software written by teams who ship reliable software • If you’re looking for security, what difference does it make?
Want to find: • Diamond (US Geological Survey) Look for: Indicator Minerals • Garnet (Moha112100 @ Wikipedia) • Diopside (Rob Lavinsky) • Chromite (Weinrich Minerals, Inc.)
Step 4: Reports While we work on gathering data and developing our model, we’re also • Developing reports • Building relationships with partner organizations like Consumer Reports • Looking for security orgs to share data with
The Progression of CITL Tech Static Static (Prototype) (Extensible) First reports First Data Final Model & Reports AFL CITL-fuzz NEW FUZZER Today
• Lots of architectures: x86-*, ARM-*, MIPS-* • Lots of operating systems: Windows, Linux, OS X Applied Static • Lots of binary formats: PE, ELF, MachO Analysis • Each with their own app-armoring features • Lots of versions of each of the above!
OS Comparisons Ubuntu Windows OSX • Windows lags in stack guards, but has 16.04 10 10.13.1 good usage of CFI 64 bit 97% 66% 77% • Linux does more source fortification aslr 100% 99% 100% than OSX dep 99% 98% 100% stack_guards 79% 40% 73% • Windows has the best function hygiene fully fortified 11% 2% • Linux’s function hygiene is slightly partial fort 42% 33% worse than OSX’s cfi 92% good 4% 19% 29% risky 67% 30% 60% bad 28% 3% 24% ick 3% 0% 2%
Linux Browsers – Ubuntu 16.04 • Scores are all very close, Firefox wins Chrome Firefox Opera version 63.0.3239.13 57.0.4 50.0.2762.4 by a nose in static analysis 64bit 100% 100% 100% • Chrome’s sandbox isn’t factored into aslr 100% 100% 100% score yet dep 100% 100% 100% relro 86% 100% 11% • All have inconsistent function hygiene stack_guards 86% 87% 100% • Opera takes a hit for lack of RELRO partial fortification 29% 70% 56% functions • Chrome lags behind in fortification use good 12% 4% 22% risky 86% 91% 100% bad 62% 61% 89% scores 5th % 35 64 43 50th % 58 78 48 95th % 71 86 65
OSX Browsers Chrome Firefox Opera Safari • Firefox and Opera had all binaries 64 63.0.3239.13 57.0.4 50.0.2762.45 11.0.1 count 9 19 8 25 bit with ASLR, Stack DEP 64bit 89% 100% 100% 88% • Firefox also made most use of stack aslr 89% 100% 100% 100% dep 100% 100% 100% 100% guards and fortification heap 11% 0% 0% 0% stack_guards 78% 95% 88% 68% • Chrome is the only one to enable partial fortification 33% 47% 38% 4% Heap protection flag good 33% 37% 25% 8% • Safari isn’t using source fortification risky 89% 95% 100% 44% much bad 44% 68% 38% 8% scores • Scores are very close, all near 95 th 5th % 33 43 38 24 percentile for High Sierra (71) 50th % 51 56 51 51 95th % 63 71 63 64 • Same general outcome as in Linux
Windows 10 Browsers Chrome Edge Firefox Opera • Scores are very close, but Edge wins by version 63.0.3239 41.16299 57.0.4 50.0.2762 a hair count 31 7 31 16 • 95 th percentile is 64 for Win 10 64bit 62% 100% 94% 100% dep 100% 100% 100% 100% • Chrome has more 32 bit binaries than aslr 100% 100% 100% 100% the others cfi 13% 100% 13% 38% stack guards 94% 57% 61% 94% • Edge is the only one with 100% CFI functions good 0% 0% 3% 0% • Chrome and Opera do better on stack risky 9% 0% 16% 0% guards bad 9% 0% 0% 0% • Firefox takes a hit because it excels in scores 5th % 23 44 7.5 44 neither, has more risky functions 50th % 44 64 44 44 95th % 64 64 44 64
OSX Time Progression • Looked at four versions from 10.10.5 through 10.13.1 • 7.7% increase in percent of binaries that are 64 bit • 2% increase in use of stack guards, good functions • Heap protection decrease correlates with ASLR increase? • High Sierra shows significant decrease in # of binaries (~400 fewer) OSX OSX OSX OSX total 10.10.5 10.11.6 10.12.6 10.13.1 change # binaries 6449 6456 7017 6622 64bit 69% 71% 73% 77% +8 aslr 99% 99% 100%* 100%* +1 heap 5% 5% 4% 4% -1 stack_guards 71% 71% 72% 73% +2 good functions 27% 27% 27% 29% +2 risky functions 62% 62% 60% 60% -2 bad functions 25% 25% 24% 24% -1
Safari Time Progression • New binaries introduced in High Sierra generally decreased performance • Overall increases in 64bit and stack guards, but not consistently • Function hygiene got a bit worse, especially in High Sierra • Partial source fortification introduced in HS Safari total in OSX 10.10.5 10.11.6 10.12.6 10.13.1 change # binaries 9 13 22 25 64bit 83% 92% 86% 88% +5* stack_guards 67% 69% 73% 68% +1 partial fortification 0% 0% 0% 4% +4 good functions 17% 15% 9% 8% -9 risky 50% 36% 38% 44% -6* bad 0% 8% 5% 8% +8
Mining Useful Spectre Gadgets • Focus on BTB poisoning aka Variant 2 widgets • Use DFA to locate this pattern: • Op reg1,[base (+index)] • Base or Index either attacker controlled or useful data • … (anything that doesn’t destroy data in reg1) • Op [base (+index)],reg2 or Op reg2,[base (+index)] • Where base or index are reg1 • Tl;dr: load, load or store
Mining Useful Spectre Gadgets
• We’ve been reporting bugs • Firefox on OSX was missing ASLR (they fixed it quick!) • Several patches & bugs submitted to LLVM & Qemu • We’ve inspired others CITL: Impact • Big shout-out to the Fedora Red Team • We’ve partnered to cover broader domains • Consumer Reports https://www.consumerreports.org • The Digital Standard https://thedigitalstandard.org
Recommend
More recommend