ski exposing kernel concurrency bugs through systematic
play

SKI: Exposing Kernel Concurrency Bugs through Systematic Schedule - PowerPoint PPT Presentation

SKI: Exposing Kernel Concurrency Bugs through Systematic Schedule Exploration Pedro Fonseca (MPI-SWS) Rodrigo Rodrigues Bjrn Brandenburg (MPI-SWS) (NOVA University of Lisbon) OSDI 2014 SKI: Exposing Kernel Concurrency Bugs


  1. SKI: Exposing Kernel Concurrency Bugs through Systematic Schedule Exploration Pedro Fonseca (MPI-SWS) Rodrigo Rodrigues Björn Brandenburg (MPI-SWS) (NOVA University of Lisbon) OSDI 2014 SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  2. Kernel concurrency bugs ● Bugs that depend on the instruction interleavings – Triggered only by a subset of the interleavings SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  3. Kernel concurrency bugs ● Bugs that depend on the instruction interleavings – Triggered only by a subset of the interleavings ● Plenty of kernel concurrency bugs in kernels! SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  4. Kernel concurrency bugs ● Bugs that depend on the instruction interleavings – Triggered only by a subset of the interleavings ● Plenty of kernel concurrency bugs in kernels! The bug is a race and not always easy to reproduce . [...] On my The bug is a race and not always easy to reproduce . [...] On my particular machine, [the test case] usually triggers [the bug] particular machine, [the test case] usually triggers [the bug] within 10 minutes but enabling debug options can change the within 10 minutes but enabling debug options can change the timing such that it never hits. Once the bug is triggered, the timing such that it never hits. Once the bug is triggered, the machine is in trouble and needs to be rebooted. machine is in trouble and needs to be rebooted. Linux 3.0.41 change log SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  5. Kernel concurrency bugs ● Bugs that depend on the instruction interleavings – Triggered only by a subset of the interleavings ● Plenty of kernel concurrency bugs in kernels! The bug is a race and not always easy to reproduce . [...] On my The bug is a race and not always easy to reproduce . [...] On my particular machine, [the test case] usually triggers [the bug] particular machine, [the test case] usually triggers [the bug] within 10 minutes but enabling debug options can change the within 10 minutes but enabling debug options can change the timing such that it never hits. Once the bug is triggered, the timing such that it never hits. Once the bug is triggered, the machine is in trouble and needs to be rebooted. machine is in trouble and needs to be rebooted. Linux 3.0.41 change log [The bug] was quite hard to decode as the reproduction time [The bug] was quite hard to decode as the reproduction time is between 2 days and 3 weeks and intrusive tracing is between 2 days and 3 weeks and intrusive tracing makes it less likely [...] makes it less likely [...] Linux 3.4.41 change log SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  6. Kernel concurrency bugs ● Bugs that depend on the instruction interleavings – Triggered only by a subset of the interleavings ● Plenty of kernel concurrency bugs in kernels! Three of the fve 3.4.9 machines [...] locked up. The bug is a race and not always easy to reproduce . [...] On my Three of the fve 3.4.9 machines [...] locked up. The bug is a race and not always easy to reproduce . [...] On my I've tried reproducing the issue, but so far I've particular machine, [the test case] usually triggers [the bug] I've tried reproducing the issue, but so far I've particular machine, [the test case] usually triggers [the bug] been unsuccessful [...] within 10 minutes but enabling debug options can change the been unsuccessful [...] within 10 minutes but enabling debug options can change the timing such that it never hits. Once the bug is triggered, the Linux kernel mailing list (5/1/2013) timing such that it never hits. Once the bug is triggered, the machine is in trouble and needs to be rebooted. machine is in trouble and needs to be rebooted. Linux 3.0.41 change log [The bug] was quite hard to decode as the reproduction time [The bug] was quite hard to decode as the reproduction time is between 2 days and 3 weeks and intrusive tracing is between 2 days and 3 weeks and intrusive tracing makes it less likely [...] makes it less likely [...] Linux 3.4.41 change log SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  7. Approaches to explore interleavings ● Stress testing approach – Hope to fnd the interleaving SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  8. Approaches to explore interleavings ● Stress testing approach – Hope to fnd the interleaving ● Systematic approach – Take full control of the interleavings – Existing tools focus on user-mode applications SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  9. Approaches to explore interleavings ● Stress testing approach – Hope to fnd the interleaving ● Systematic approach – Take full control of the interleavings – Existing tools focus on user-mode applications This talk SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  10. Approaches to explore interleavings ● Stress testing approach – Hope to fnd the interleaving ● Systematic approach – Take full control of the interleavings – Existing tools focus on user-mode applications Focus on operating system kernels This talk SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  11. SKI Finding kernel concurrency bugs ● Testing applications versus kernels ● Our approach ● Implementation ● Evaluation SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  12. Existing user-mode tools App Existing user-mode Kernel-level abstractions Threads and sync. objects systematic tools LD_PRELOAD, ptrace Kernel SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  13. Existing user-mode tools App Existing user-mode Kernel-level abstractions User-mode testing tool Threads and sync. objects systematic tools LD_PRELOAD, ptrace Kernel Scheduler SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  14. Kernel-mode challenges Testing tool ● Kernel doesn't have a good instrumentation interface Kernel Scheduler SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  15. Kernel-mode challenges ● Kernel doesn't have a good instrumentation interface Kernel Scheduler ● An alternative would be to modify the kernel – But kernel modifcations: SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  16. Kernel-mode challenges ● Kernel doesn't have a good instrumentation interface Kernel Scheduler ● An alternative would be to modify the kernel – But kernel modifcations: ● Change the tested software ● Are non-trivial ● Hinder portability SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  17. Kernel-mode challenges ● Kernel doesn't have a good instrumentation interface Kernel Scheduler ● An alternative would be to modify the kernel – But kernel modifcations: ● Change the tested software ● Are non-trivial ● Hinder portability Avoid kernel modifcations SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  18. User-mode versus kernel-mode App Existing user-mode Kernel-level abstractions Threads and sync. objects systematic tools LD_PRELOAD, ptrace Kernel Scheduler HW-level abstractions Our tool mov, add, jmp, registers, APIC (modifed VMM) Hardware SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  19. User-mode versus kernel-mode App Existing user-mode Kernel-level abstractions Threads and sync. objects systematic tools LD_PRELOAD, ptrace Kernel Scheduler HW-level abstractions Kernel testing tool Our tool mov, add, jmp, registers, APIC (modifed VMM) Hardware SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  20. SKI Finding kernel concurrency bugs SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  21. SKI Finding kernel concurrency bugs Systematic Full control of the kernel interleavings SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  22. SKI Finding kernel concurrency bugs Systematic Practical + No modifcations Full control of the to the kernel kernel interleavings Fast SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  23. SKI Finding kernel concurrency bugs ● Challenges testing the kernel code ● SKI's approach ● Implementation ● Evaluation SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  24. SKI's approach VM App Challenges 1. How to control the schedules? Kernel 2. Which contexts are schedulable? 3. Which schedules to choose? HW-level abstractions mov, add, jmp, registers, APIC SKI VMM SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  25. 1. How to control the kernel schedules? Thread 1 Thread 2 MOV ADD PUSH t MOV MOV SUB JMP CPU SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  26. 1. How to control the kernel schedules? Pin each tested thread to a diferent CPU (thread afnity) ● Thread 1 Thread 2 Thread 1 Thread 2 MOV MOV ADD ADD PUSH PUSH Pin t MOV MOV MOV MOV SUB SUB JMP JMP CPU CPU 1 CPU 2 SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

  27. 1. How to control the kernel schedules? Pin each tested thread to a diferent CPU (thread afnity) ● Pause and resume CPUs to control schedules ● Thread 1 Thread 2 Thread 1 Thread 2 Thread 1 Thread 2 MOV MOV MOV ADD ADD PUSH PUSH PUSH MOV Pin Control t MOV MOV SUB MOV MOV ADD SUB SUB MOV JMP JMP JMP CPU CPU 1 CPU 2 CPU 1 CPU 2 SKI: Exposing Kernel Concurrency Bugs Pedro Fonseca

Recommend


More recommend