5/18/2010 Traditional PC application Shuo Chen Rui Wang, XiaoFeng Wang and Kehuan Zhang Web application Web application (1) split between client and server (1) split between client and server (2) state transitions driven by network traffic IEEE Symposium on Security and Privacy Oakland, California May 17 th , 2010 Worry about privacy? Let’s do encryption. • The eavesdropper cannot see the contents, but can • Surprisingly detailed user information is being leaked observe : out from several high-profile web applications • number of packets, timing/size of each packet • personal health data, family income, investment details, search queries • Previous research showed privacy issues in various • (Anonymized app names per requests from related companies) p ) domains: domains: • The root causes are some fundamental characteristics • SSH, voice-over-IP, video-streaming, anonymity channels (e.g., Tor) in today’s web apps • stateful communication, low entropy input and significant • Our motivation and target domain: traffic distinctions. • target: today’s web applications • Defense is non-trivial • motivation: Software-as-a-Service (SaaS) becomes mainstream, • effective defense needs to be application specific. and the web is the platform to deliver SaaS apps. • calls for a disciplined web programming methodology. (“ A ” denoting a pseudonym) Scenario: search using encrypted Wi-Fi WPA/WPA2. • A web application by one of the most reputable Example: user types “ list ” on a WPA2 laptop. companies of online services 821 � • Illness/medication/surgery information is leaked out, 910 822 � as well as the type of doctor being queried. yp g q 931 823 � • Vulnerable designs 995 • Entering health records 824 � • By typing – auto suggestion • By mouse selecting – a tree-structure organization of elements 1007 • Finding a doctor Attacker’s effort: linear, not exponential. • Using a dropdown list item as the search input Consequence: Anybody on the street knows our search queries. 1
5/18/2010 • It is the online version of one of the most widely used Entering health records: no matter Find-A-Doctor: attacker can applications for the U.S. tax preparation. keyboard typing or mouse selection, uniquely identify the specialty. attacker has a 2000 × ambiguity reduction power. • Design: a wizard-style questionnaire • Tailor the conversation based on user’s previous input. tabs • The forms that you work on tell a lot about your family • Filing status • Number of children • Paid big medical bill • The adjusted gross income (AGI) $0 Even worse, most decision procedures for credits/deductions All transitions have unique traffic patterns. Entry page of have asymmetric paths. Full credit Partial credit Not eligible Deductions & Eligible – more questions Credits $115000 $145000 Not eligible – no more question Summary of Deductions & Not eligible Credits Entry page of Summary of Deductions & D d Deductions & ti & N t Not eligible li ibl Credits Credits Full credit Partial credit Enter your paid interest Consult the IRS instruction: $0 Full credit $1000 for each child Partial credit Partial credit Not eligible Full credit Phase-out starting from $110,000. For every $1000 income, lose $50 credit. $110000 $150000 (two children scenario) A major financial institution in the U.S. Which funds you invest? • No secret. $0 Disabled Credit • Each price history curve is a $24999 Earned Income Credit GIF image from MarketWatch. $41646 • Everybody in the world can Retirement Savings $53000 obtain the images from College Expense College Expense $116000 $116000 MarketWatch. MarketWatch IRA Contribution • Just compare the image sizes! $85000 $105000 Student Loan Interest $115000 $145000 Child credit * $110000 $130000 or $150000 or $170000 … Your investment allocation First-time Homebuyer credit $150000 $170000 • Given only the size of the pie chart, Adoption expense $174730 $214780 can we recover it? We are not tax experts. • Challenge: hundreds of pie-charts OnlineTax A can find more than 350 credits/deductions. collide on a same size. 2
5/18/2010 Inference based on the evolution of the pie-chart size in 4-or-5 days The financial institution updates the pie chart every day after the market is closed. The mutual fund prices are public knowledge. Root causes: some fundamental rts characteristics of today s web applications characteristics of today’s web applications ≅ 80000 cha Size of day 1 Prices of the day Prices of the day Prices of the day Size of day 4; Size of day 2; Size of day 3; ≅ 800 charts 1 chart ≅ 80 charts ≅ 8 charts Fundamental characteristics of web apps • Significant traffic distinctions – The chance of two different user actions having the same traffic pattern is really small. – Distinctions are everywhere in web app traffic. It’s the norm. Challenging to Mitigate the Vulnerabilities g g g • Low entropy input Low entropy input – Eavesdropper can obtain a non-negligible amount of information • Stateful communication – Many pieces of non-negligible information can be correlated to infer more substantial information – Often, multiplicative ambiguity reduction power! OK to use rounding or random-padding Traffic differences are everywhere. Which ones result 32.3% network overhead (i.e., 1/3 bandwidth on side- in serious data leaks? channel info hiding) Need to analyze the application semantics, the availability of domain knowledge, etc. Hard. Is there a vulnerability-agnostic defense to fix the vulnerabilities without finding them? Obviously, padding is a must-do strategy. Packet size rounding: pad to the next multiple of Δ Random-padding: pad x bytes, and x ∈ [0, Δ ) We found that even for the discussed apps, the defense policies have to be case-by-case. 3
5/18/2010 Rounding is not appropriate, because Neither rounding nor random-padding can solve the Google’s responses are compressed. problem. The destination networks may or may not Because of the asymmetric path situation uncompress the responses 15 40% E.g., Microsoft gateways uncompress and inspect web E Mi ft t d i t b overhead Attack Power 12 30% traffic, but Indiana University does not. 9 rounding before the compression � Indiana Univ. still 20% 6 sees distinguishable sizes; 10% 3 rounding after the compression � Microsoft still sees 0 0% distinguishable sizes 1 16 64 128 256 512 1024 2048 Random padding is not appropriate, because Repeatedly applying a random padding policy to the same responses will quickly degrade the effectiveness. Suppose the user checks the mutual fund page for 7 times, then 96% probability that the randomness shrinks to Δ /2. OnlineInvest A cannot do the padding by itself Because the browser loads the images from MarketWatch. Need to develop a disciplined methodology for side-channel-info hiding Acknowledgements Ranveer Chandra – guidance on Wi-Fi experiments • Side Side- -channel channel- -leaks are a serious threat to user leaks are a serious threat to user privacy in the era of privacy in the era of SaaS SaaS. . Cormac Herley – suggestion about using the pie-chart evolution in multiple days • Defense must be vulnerability Defense must be vulnerability-specific, and y p specific, and p , , thus non thus non- -trivial. trivial. Emre Kiciman – Insights about the HTTP protocol • Call for future research on the programming Johnson Apacible, Rob Oikawa, Jim Oker and Yi-Min Call for future research on the programming Wang practice for protecting online privacy. practice for protecting online privacy. 4
Recommend
More recommend