run dma
play

Run-DMA Michael Rushanan, Stephen Checkoway Johns Hopkins - PowerPoint PPT Presentation

Run-DMA Michael Rushanan, Stephen Checkoway Johns Hopkins University, University of Illinois at Chicago 1 Introduction Arbitrary computation using Direct Memory Access engine Access all resources of the device Implement the following


  1. Run-DMA Michael Rushanan, Stephen Checkoway Johns Hopkins University, University of Illinois at Chicago 1

  2. Introduction • Arbitrary computation using Direct Memory Access engine • Access all resources of the device • Implement the following as an example: • Brainfuck Glorified • Rootkit memcpy 2

  3. D irect M emory A ccess CPU Auxiliary Main Processor Memory DMA • Offload task of copying memory to/from auxiliary processors (e.g., NIC, GPU, etc) • Free CPU to do more interesting work 3

  4. DMA Engine • CPU configures DMA transfer by setting control registers • Control registers specify transfer operation src dest length next_cb Control Block Structure 4

  5. Control Blocking Chaining • Scatter/gather DMA can transfer to/from multiple memory areas in a single transaction • Configure a sequence of control blocks src src src dest dest dest length length length next_cb next_cb next_cb 5

  6. Required DMA Properties • Perform memory-to-memory copies • Programmed by loading address of control blocks • Supports scatter/gather mode 6

  7. Target Device • Raspberry Pi 2 single-board computer BCM2836 • Other Potential DMA Engines: • Intel 8237 (e.g., legacy IBM PC/ATs) • Cell multi-core microprocessor (e.g., PS3) 7

  8. DMA Gadgets • DMA “programs” require self-modifying constructs • Overwrite members of later control blocks src 01 00 00 00 01 00 00 00 cb 0 cb 1 8

  9. Table Lookups 00 02 01 src sqr_tbl 04 dest … 01 00 00 00 01 00 00 00 next_cb 04 cb 0 cb 1 01 sqr_tbl 9

  10. Basic Building Blocks Unary Lookup value in table and y = f(x) Functions store to memory Copy value pointed to into Variable src/dest of subsequent *x Dereferencing control block 10

  11. Basic Building Blocks Address of a control block written to the Conditional Goto next_cb member of a trampoline Offset table with entries that are offsets Switch into an address table Memory-mapped Loop over memory-mapped flag or I/O Registers status register 11

  12. BrainFuck 12

  13. BrainFuck increment the cell pointed to + ++*ptr; by head decrement the cell pointed to - --*ptr; by head increment head to point to > ++ptr; the next cell decrement head to point to < --ptr; the previous cell 13

  14. BrainFuck if the cell pointed to by head is nonzero, execute next [ while (*ptr) { instruction; otherwise, jump to the instruction following ] if the cell pointed to by head is zero, execute next ] } instruction; otherwise, jump to the instruction following [ store the input to the cell , *ptr=getchar(); pointed to by head output the cell pointed to by . putchar(*ptr); head 14

  15. Interpreter Implementation • 8 gadgets corresponding to BrainFuck instructions • Dispatch • Increment word and decrement word • Fetch Next instruction (i.e., increment PC and dispatch) 15

  16. Increment 01 01 00 03 00 fb 00 03 00 fb 04 03 00 fb 00 10 00 fb 02 … 04 00 00 00 04 00 00 00 01 00 00 00 01 00 00 00 ff 00 cb 0 cb 1 cb 2 cb 3 inc_tbl 16

  17. Increment 01 01 00 03 00 fb 00 03 00 fb 00 10 00 fb 02 … 04 00 00 00 04 00 00 00 01 00 00 00 01 00 00 00 ff 00 cb 0 cb 1 cb 2 cb 3 inc_tbl Variable Dereference 17

  18. Increment 01 01 00 03 00 fb 00 03 00 fb 00 10 00 fb 02 … 04 00 00 00 04 00 00 00 01 00 00 00 01 00 00 00 ff 00 cb 0 cb 1 cb 2 cb 3 inc_tbl Unary Function 18

  19. Increment 01 01 00 03 00 fb 00 03 00 fb 04 03 00 fb 01 10 00 fb 00 10 00 fb 02 04 03 00 fb … 04 00 00 00 04 00 00 00 01 00 00 00 01 00 00 00 ff 00 cb 0 cb 1 cb 2 cb 3 inc_tbl 19

  20. Dispatch dispatch_tbl 00 04 … 08 … 0c … 10 … 04 2b quit 00 30 00 fb e0 30 00 fb 2b 20 00 fb 00 20 00 fb 00 23 00 fb 08 23 00 fb nop … 04 00 00 00 04 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 inc cb 0 cb 1 cb 2 cb 3 trampoline dec insn_tbl 20

  21. Dispatch dispatch_tbl 00 04 … 08 … 0c … 10 … 04 2b quit 00 30 00 fb e0 30 00 fb 2b 20 00 fb 00 20 00 fb 00 23 00 fb 08 23 00 fb nop … 04 00 00 00 04 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 inc cb 0 cb 1 cb 2 cb 3 trampoline dec insn_tbl Variable Dereference 21

  22. Dispatch dispatch_tbl 00 04 … 08 … 0c … 10 … 04 2b quit 00 30 00 fb e0 30 00 fb 2b 20 00 fb 00 20 00 fb 00 23 00 fb 08 23 00 fb nop … 04 00 00 00 04 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 inc cb 0 cb 1 cb 2 cb 3 trampoline dec insn_tbl Switch 22

  23. Dispatch dispatch_tbl 00 04 … 08 … 0c … 10 … 04 2b quit 00 30 00 fb e0 30 00 fb 2b 20 00 fb 00 20 00 fb 00 23 00 fb 08 23 00 fb nop … 04 00 00 00 01 00 00 00 01 00 00 00 04 00 00 00 00 00 00 00 inc cb 0 cb 1 cb 2 cb 3 trampoline dec insn_tbl 23

  24. Turing-Complete Simulate any other computational device/language • BrainFuck is Turing-complete • We implemented BrainFuck with DMA gadgets • Thus DMA gadgets are Turing-complete 24

  25. Resource-Complete Access all resources of system from within the language • DMA has access to memory-mapped IO registers • Thus DMA gadgets are resource-complete 25

  26. Hello World https://github.com/stevecheckoway/rundma 26

  27. More Gadgets • Binary functions • f : {0,1}8 × {0,1}8 → {0,1}8 • Relational operators • Equality (e.g., =) • Inequality (e.g., <) 27

  28. Raspbian Rootkit • Raspbian Linux • task_structs hold information about a process • pointer to cred structure (e.g., UID of process) • pointer to next structure … init_task task 1 task n 28

  29. DMA Performance Gadget Control Blocks inc/dec 4 inc/dec word 4 + 2 trampolines dispatch 33 right/left 26 left/right condition 2 I/O 5 29

  30. Total DMA Transfers Program Control Blocks Interpreter 148 Hello World 36356 Rootkit 20 30

  31. DMA Malware • DMA Malware • Code running on auxilary processor/external device with DMA access • Example: firewire, thunderbolt, NIC, GPU • Main difference of our work: • DMA gadgets run entirely on DMA engine • No additional processors 31

  32. Countermeasures • Input/out memory management (Duflot, 2011) • Peripheral firmware load-time integrity (Stewin, 2012) • Anomaly detection systems (Duflot, 2011) • Bus agent runtime monitors (Stewin, 2013) 32

  33. Conclusion • Everything non-trivial ends up being Turing-complete • Parsing file formats • Page Tables • DMA Engine is yet another example • We need to consider specialized hardware 33

  34. Questions? 34

Recommend


More recommend