Fault Tolerance and Security Heechul Yun 1
Safety Failures in CPS Therac 25 Arian 5 • Computer controlled medical X-ray • 7 billion dollar rocket was destroyed after 40 treatments secs (6/4/1996) • Six people died/injured due to massive • “caused by the complete loss of guidance and overdoses (1985-1987) altitude information ” Caused by 64bit • Caused by synchronization mistakes floating to 16bit integer conversion 2
Safety Failures in CPS http://www.nytimes.com/2015/01/28/us/white-house-drone.html http://petapixel.com/2015/12/23/crashing-camera-drone-narrowly-misses-top-skiier/ http://rochester.nydatabases.com/map/domestic-drone-accidents http://www.nytimes.com/interactive/2016/07/01/business/inside-tesla-accident.html Failures in CPS have consequences 3
Air France 447 (2009) • Airbus A330 crashed into the Atlantic Ocean in 2009 • Caused in part by computer’s misguidance – Pitot tube (speed sensor) failure Flight Director (FD) malfunction (shows “head up”) pilots follow the faulty FD enter stall Normal Stall http://www.slate.com/blogs/the_eye/2015/06/25/air_france_flight_447_and_the_safety_paradox_of_airline_automation_on_99.html 4 http://www.spiegel.de/international/world/experts-say-focus-on-manual-flying-skills-needed-after-air-france-crash-a-843421.html
Lion Air Flight 610 (2018) • Boeing 737 Max crashed into the Java See in 2018 • Caused by stall prevention system (MCAS) – sensor error (plane is “stall”) nose down (to the ocean) 5
Ethiopian Air 302 (2019) https://www.seattletimes.com/business/boeing-aerospace/failed-certification-faa-missed-safety-issues-in-the-737-max-system-implicated-in -the-lion-air-crash 6
Boeing 737 MAX • Controversial designs of the MCAS – Designed to use a single AoA sensor • Even though there are two AoA sensors • Single-point-of-failure. – More powerful than the pilots • Overrode the pilots’ pitch -up commands • Yet, classified as “hazardous” ( Lvl B), not critical (Lvl A) • Planned software update – Use both sensors, limit the power https://www.seattletimes.com/business/boeing-aerospace/failed-certification-faa-missed-safety-issues-in-the-737-max-system-implicated 7 -in-the-lion-air-crash/
Lufthansa A321 (2014) • Similar prior incidents that didn’t kill people. • Faulty AoA sensor readings (ice) trigger an automated stall prevention system called ‘Alpha Prot ’, resulting in 4,000 ft loss of altitude • Triple redundant sensors with a voting mechanism. But two sensors were iced up simultaneously. The only working sensor’s value was discarded. • “When Alpha Prot is activated due to blocked AOA probes, the flight control laws order a continuous nose down pitch rate that, in a worst case scenario, cannot be stopped with backward sidestick inputs, even in the full backward position .” https://avherald.com/h?article=47d74074 8
Tesla Autopilot (2016) • Tesla autopilot failed to recognize a trailer resulting in a death of the driver http://www.nytimes.com/interactive/2016/07/01/business/inside-tesla-accident.html 9
NHTSA Report • Both the radar and camera sub-systems are designed for front-to-rear collision prediction mitigation or avoidance. • The system requires agreement from both sensor systems to initiate automatic braking. • The camera system uses Mobileye’s EyeQ3 processing chip which uses a large dataset of the rear images of vehicles to make its target classification decisions. • Complex or unusual vehicle shapes may delay or prevent the system from classifying certain vehicles as targets/threats https://static.nhtsa.gov/odi/inv/2016/INCLA-PE16007-7876.PDF 10
NHTSA Report • Object classification algorithms in the Tesla and peer vehicles with AEB technologies are designed to avoid false positive brake activations . • The Florida crash involved a target image (side of a tractor trailer) that would not be a “true” target in the EyeQ3 vision system dataset and • The tractor trailer was not moving in the same longitudinal direction as the Tesla, which is the vehicle kinematic scenario the radar system is designed to detect https://static.nhtsa.gov/odi/inv/2016/INCLA-PE16007-7876.PDF 11
Uber Self-Driving Car (2018) • Kill a pedestrian crossing a road in Arizona https://www.nytimes.com/2018/03/19/technology/uber-driverless-fatality.html 12
NTSB Report • The system first registered radar and LIDAR observations of the pedestrian about 6 seconds before impact • Software classified the pedestrian as an unknown object, as a vehicle, and then as a bicycle with varying expectations of future travel path. • At 1.3 seconds before impact, the system determined that Failures in CPS have consequences an emergency braking maneuver was needed • Emergency braking maneuvers are not enabled while the vehicle is under computer control, to reduce the potential for erratic vehicle behavior https://www.ntsb.gov/investigations/AccidentReports/Reports/HWY18MH010-prelim.pdf 13
Challenges for Safe CPS • Time Predictability • Complexity • Reliability • Security 14
Real-Time Predictability victim attackers Core1 Core2 Core3 Core4 LLC • Observed worst-case: >300X (times) slowdown – On simple in-order multicores (Raspberry Pi3, Odroid C2) Difficult to guarantee predictable timing Michael G. Bechtel and Heechul Yun. “Denial -of-Service Attacks on Shared Cache in Multicore: Analysis and Prevention.” In RTAS , 2019 (to appear, Outstanding Paper Award )
Complexity • Software complexity increases Growth in Software Size Lines of Code in Typical GM Car 1400 100000 1200 10000 1000 K SLOC 1000 KLOC 800 600 100 400 10 200 1 0 1970 1990 2010 Apollo 1968 Space Shuttle Orion (est.) Model Year Flight Vehicle More bugs, unintended side-effects Figures are from NASA JPL. “Flight Software Complexity,” 2008 16
Hardware Reliability • Transient hardware faults (soft errors) – Due to environment factors (ex: alpha particle, cosmic radiation) – Manifested as software failures – Bigger problem in advanced CPU • Increased density higher soft error rate (SER) per chip Ibe et al., “Scaling Effects on Neutron -Induced Soft Error in SRAMs http://www.cotsjournalonline.com/articles/view/102279 Down to 22nm Process” (Hitachi) More susceptible to environmental factors 17
Hardware Reliability Wordline Row of Cells Row Victim Row V LOW V HIGH Aggressor Row Row Opened Closed Victim Row Row Row Repeatedly opening and closing a row induces disturb ance errors in adjacent rows Hardware can be exploited by attackers 18 This slide is from the Dr. Yoongu Kim’s ISCA 2014 presentation
Software Security • Insecure software in CPS safety hazards • Stuxnet: first reported cyber warfare, targeted for Iranian nuclear plants (destroying centrifuges) • Vermont power grid hack by Russia • Remote hack into cars (Zeep) • Police drone hacking CPS software can be attacked by hackers 19
Hardware Security https://meltdownattack.com/ Hardware can leak secrets to attackers 20
How to Improve Safety of CPS? • Correct by design – Formal method based software development • Difficult for a complex system – Use reliable hardware • e.g., radiation hardened processors • Expensive and low performance • Deal with failures – Run-time monitoring and redundancy 21
This Week: Fault Tolerance/Security • A Simplex Architecture for Intelligent and Safe Unmanned Aerial Vehicles, RTCSA16 • Comprehensive Experimental Analyses of Automotive Attack Surfaces, USENIX Security, 2011 (Dalton) 22
arXiv: https://arxiv.org/abs/1811.12555 Video: https://www.youtube.com/watch?v=poRbH__kB2M 23
Recommend
More recommend