surviving sensor network software faults
play

Surviving Sensor Network Software Faults Presented by Jacek Migdal - PowerPoint PPT Presentation

Surviving Sensor Network Software Faults Presented by Jacek Migdal Software crashes Software bugs are common: tests may not reveal rare problems hard to identify and fix ... but sensor network should be able to work for years. Ariane


  1. Surviving Sensor Network Software Faults Presented by Jacek Migdal

  2. Software crashes Software bugs are common: • tests may not reveal rare problems • hard to identify and fix ... but sensor network should be able to work for years. Ariane 5 Flight 501

  3. Common approach Have you tried rebooting?

  4. Rebooting on failure • works in most cases (memory faults) • recent data is lost • time consuming => reduce availability • cause additional cost for routing protocols

  5. Proposed solution: Neutron Divide software into recovery units and reboots the faulty unit.

  6. Hardware • 1-8 MHz • 4-10 kB SRAM • 40-128 KB flash memory • without hardware memory isolation Low overhead solution is needed.

  7. Architecture Compiler Neutron Deputy extensions compiler TinyOS Safe TinyOS TOSThreads Neutron recovery code

  8. Recovery unit Definition: Recovery unit: • application • application recovery unit • kernel may not call directly into a different recovery unit • instanties at least one thread (kernel has exactly one) • every component belongs at most to one application recovery unit or to kernel recovery unit

  9. How to divide program into recovery unit • Use annotations to define kernel boundaries (@syscall_base, @syscall_ext) • Use Deputy compiler to divide program into recovery unit and isolate them

  10. How to recover application unit 1. Cancel system 3. Re-initialize calls and halt application unit threads RAM (pending flag) 4. Restart the 2. Reclaim application unit allocated thread memory

  11. How to recover kernel unit 1. Cancel 3. Reboot the outstanding TinyOS (skip system calls thread state initialization) 2. Save application dependent 4. Restart the state. saved state.

  12. Precious state • Losing state of application is too costly. • Maintain variable value across application unit restart (mark them with @precious flag).

  13. Precious state Recovery: Features: 1.Check for 1.Groups corruption 2.Atomic 2.Push to stack operations 3.Re-initialize 3.(Optional) Check recovery unit integrity on 4.Pop from stack application level and copy 4.Pop from stack and copy

  14. Evaluation availability

  15. Evaluation routing protocol cost

  16. Evaluation - overhead Low programmer overhead (mostly cost of adding annotations)

  17. Related work • kernel level safety (most OS, using virtual address space) • language-level safety • micro reboots (Java Enterprise Edition)

  18. Conclusion Neutron: • recovers from memory safety bugs • divide program into recovery unit • re-initialize faulty unit on error • implement as part of compiler and TinyOS • designed for limited architecture • reduce time to synchronization by 94% and cost of routing protocol by 99.5%

  19. References • Y. Chen, O. Gnawali, M. Kazandjieva, P . Levis, J. Regehr: “Surviving Sensor Network Software Faults,” in Proceedings ACM SOSP 2009, Big Sky, MT, USA, October 2009. • Image sources: • http://top10latest.com/top-10-costliest-software-bugs • http://www.personal.kent.edu/~rmuhamma • http://store.fungizmos.com • http://omrumfuneraltransport.com • http://www.moddergamer.com

Recommend


More recommend