System Security: From Discovery to Innovation XiaoFeng Wang James H. Rudy Professor Indiana University at Bloomington xw7@Indiana.edu http://www.informatics.indiana.edu/xw7/
System Security Research Inherently Interdisciplinary and Multi-dimensional
Follow the Tech Trends v
System Security Research Inherent Interdisciplinary and Multi-dimensional Discovery-driven, utility centric
Sources for Security Innovations Software security E.g., memory attack, jump to libraries => ASLR Mobile security Malware infection => app sandbox + app store vetting OS security E.g., OS-level attacks => TEE (such as Intel’s SGX) Network security E.g., DDoS attacks => Syn-cookie, combined detection and blocking (e.g., AWS shield) Browser security E.g., Cross-origin attacks (such as XSS) => Chrome’s site isolation Data privacy Inference attack => Differential privacy, as integrated in iOS Others: Side channels on mobile systems => closing Procfs on Android; data perturbation on iOS Credential attacks => multi-factor authentication …
Destructive Research Security research needs wreckers !!! Build a better system Fix the cracks Understand the cracks and the fundamentality Find the cracks Wreck “secure” systems
How to Innovate in Security Research Follow the technical tide Current: Mobile security Emerging: IoT/CPS security Future: ML Security, Genome Privacy Understanding new technologies Finding weaknesses Finding utilities and constraints Asking big questions Fundamental causes of the problem? How to do better (under the constraints)?
Examples: Destructive Research on Mobile and IoT Security CCS’13, Oakland’15, NDSS’14, 15, CCS’17
NO Bugs in apps NO implementation flaws in system What can a zero-permission app still learn?
Android Public Resource Usability Application Framework Adversary Goals Model Public APIs (Audio Usage, CPU Usage, Running application list) Linux Kernel Public files (procfs, sysfs)
Finding Your Location Adversary controlled Deliver BSSID through web-server browser Zero-permission app monitoring /proc/net/arp
Why is BSSID Sensitive? GPS BSSID to GPS Dataset BSSID BSSID BSSID
Coverage
Evaluation
Another Example: Identity Inference Per-app mobile data usage: yet another piece of public data Tweet 580-720B Download 541-544B
Attack People who People who tweeted at tweeted at Timestamp2±60s Timestamp1 Timestamp1±60s Timestamp2 Timestamp3 Timestamp4 People who Timestamp5 tweeted at Timestamp3±60s
Identity Recovery Manual analysis of approx. 4000 twitter accounts First and last name 79% Location 32% Bio 21%
Why Identity is Important
Other Findings Your health/financial information Mobile data usage of Yahoo! Finance and WebMD Your driving routes Monitor the speaker status (on or off) when running Navigator Stealthiness Monitor running apps Send data through browser when LCD is off
Our Solution A new policy enforcement framework Each app can specify the permissions for disclosing its mobile data usage Four settings: NO_Access, Rounding, Aggregation and NO_Protection Enforced by Android framework Rounding: round the usage to the multiple of a fixed size (e.g., 256B) Aggregation: release the total usage every hour, day or week
App Guardian Demo: http://sit.soic.indiana.edu/en/2015/ 09/11/app-guardian-oarland/ App: https://play.google.com/store/apps/ details?id=edu.iub.seclab.appguar dian
IoT Devices What you know What are new
Sensitive Data Those medical devices are in FDA-approved Category II In the same category of X-ray machine, infusion pump, … The data they collect are highly sensitive But can Android protect them?
What Goes Wrong here? Android is not designed to protect its external devices No device-app authentication ⇒ misbinding threat
Our Solution: SEACAT Policy DAC Policy Manager Service Manager BT stack Fast Resource-Type Cache AVC DAC MAC Policy Module
Security by Construction: What is the problem and How to make it work
What We Learned
What need to be done Communication Find out whether expected protection has been provided by the system Challenges: limited documentation, default assumptions, etc. Evolution Individualize policy settings for apps with different protection demands How to make this happen is a million-dollar question
A Step Further: Automate Security Analysis Security requirements, utility constraints? Attacker’s resources, information? Vulnerability discovery in complicated systems?
Towards Data-Driven, Intelligent Security Automatic understanding of the system Knowledge discovery from documents Automatic building of system model Automatic determination of security requirements Automatic analysis of the adversary Cyber threat intelligent gathering and analysis Intelligent vulnerability discovery Knowledge-driven system analysis
A Baby Step: Semantics-based Fuzzing
Toward Automated Vulnerability Discovery First an easier problem: Can we recover a Known vulnerability automatically? Why important? Patching delay => Attack Window Security Implications of Public Bug Information
Why Hard? Complicated bugs cannot be patched by adding a check whole chunk of code is replaced difficult to formulize how the patch works Limitations of symbolic execution and constraint solving path explosion limited formula solving capability
How About Auxiliary Information? Various sources of vulnerability information How experienced attackers benefit from auxiliary information? Question: is it possible to automate this process?
Semantics-Driven Fuzzing Basic idea: Retrieve Guide SemFuzz Exploits Target program: Linux kernel 4.0+ Information sources: CVE reports, Linux git logs Results: 16 vulnerability types beyond input validation 18 successful exploits, 2 unknown vulnerabilities
Guidance for CVE-2017-6347
Workflow Stage 1 Stage 2
Retrieving Critical Variables Symbol Table Type Name Type Name struct sk int offset sock struct skb unsignedi len sk_buff nt …... …... Parse Tree
Retrieving System Calls Identifying system call names is insufficient match syscall name MSG_MORE UDP loopback ==========> syscall: socket, sendto Building a knowledge base goal: keywords in descriptions ==> system call and parameter values source: Linux Programmer Manual (LPM) result: 1082 LPM pages, 373 system calls, 2000+ keywords
MSG_MORE ==> sendto(flags = MSG_MORE) r0 = socket(AF_INET, SOCK_DGRAM, 0) loopback ==> sendto(dest_addr = {INADDR_LOOPBACK}) sendto(r0, ..., MSG_MORE, {INADDR_LOOPBACK}, …) UDP ==> socket(socket_type = SOCK_DGRAM)
Effective of Semantics-based Fuzzing Result 16% (18/112) trigger the target vulnerability 49% (46/94) reach the vulnerable functions 20% (19/94) reach the patched basic blocks Zero-day vulnerability found when fuzzing CVE-2016-4794 new vulnerability appears around the known flaws reported and confirmed Undisclosed vulnerability found when fuzzing CVE-2016-3841 similar problems inside equivalent components patched before we reported, but no reports disclosed
Performance Trigger vulnerability count: 18 (SemFuzz) v.s. 7 (Syzkaller) time: 13.2h (SemFuzz) v.s. 33.9h (Syzkaller) Reach vulnerable functions count: 18 (SemFuzz) v.s. 14 (Syzkaller) time: 1.8h (SemFuzz) v.s. 5.2h (Syzkaller)
Future of System Security Research
Where Technologies Go, Opportunities Follow Machine Learning and Security Adversarial learning => secure ML Inference attacks on ML models => privacy-preserving ML Security in Smart Things and CPS Smart-home/smart-city security Industrial control security Smart grid security Biomedical Data Privacy Genomic data privacy (www.humangenomeprivacy.org) Other Omics privacy Others (e.g., blockchain)
Riding the New Tech Wave Data-centric, Intelligent Security NLP-enhanced protection (e.g., CTI gathering, analysis) AI (ML/reasoning) based protection (e.g., Intelligent CTF) Hardware enhanced protection Scalable TEE-based protection
Moving Forward Learn Understand it, analyze it and crack it Think Ask BIG question, seek deep insight Do Protect What need to protect Build What will be used
Data-Centric Intelligent Security
Recommend
More recommend