optimizing binary translation of dynamically generated
play

Optimizing Binary Translation of Dynamically Generated Code Byron - PowerPoint PPT Presentation

Optimizing Binary Translation of Dynamically Generated Code Byron Hawkins Brian Demsky University of California, Irvine Derek Bruening Qin Zhao Google, Inc. Profiling Bug detection Program analysis Security SPEC CPU 2006


  1. Optimizing Binary Translation of Dynamically Generated Code Byron Hawkins Brian Demsky University of California, Irvine Derek Bruening Qin Zhao Google, Inc.

  2. ● Profiling ● Bug detection ● Program analysis ● Security

  3. SPEC CPU 2006 ● 12% overhead* ● 21% overhead* *geometric mean

  4. SPEC CPU 2006 ● 12% overhead* ● 21% overhead* *geometric mean

  5. Octane JavaScript Benchmark ● 15x overhead on Chrome V8 4.4x overhead on Mozilla Ion ● 18x overhead on Chrome V8 8x overhead on Mozilla Ion

  6. Octane JavaScript Benchmark ● 15x overhead on Chrome V8 4.4x overhead on Mozilla Ion ● 18x overhead on Chrome V8 8x overhead on Mozilla Ion

  7. New Era of Dynamic Code ● Back in 2003 ... – Browsers : one single-phase JIT engine – Microsoft Office : negligible dynamic code ● A decade later... – Browsers : at least 2 multi-phase JIT engines – Microsoft Office : one multi-phase JIT ● Active at startup of all applications

  8. New Era of Dynamic Code ● Back in 2003 ... – Browsers : one single-phase JIT engine – Microsoft Office : negligible dynamic code ● A decade later... – Browsers : at least 2 multi-phase JIT engines – Microsoft Office : one multi-phase JIT ● Active at startup of all applications

  9. Goals ● Optimize binary translation of dynamic code ● Maintain performance for static code Evaluation Platform ● DynamoRIO on 64-bit Linux for x86

  10. Goals ● Optimize binary translation of dynamic code ● Maintain performance for static code Evaluation Platform ● DynamoRIO on 64-bit Linux for x86

  11. Outline ● Background on binary translation – Current optimizations for statically compiled code – Dynamic code → wasting translation overhead ● Coarse-grained detection of code changes ● New optimizations – Manual annotations – Automated inference ● Performance results ● Related Work

  12. Outline ● Background on binary translation – Current optimizations for statically compiled code – Dynamic code → wasting translation overhead ● Coarse-grained detection of code changes ● New optimizations – Manual annotations – Automated inference ● Performance results ● Related Work

  13. Outline ● Background on binary translation – Current optimizations for statically compiled code – Dynamic code → wasting translation overhead ● Coarse-grained detection of code changes ● New optimizations – Manual annotations – Automated inference ● Performance results ● Related Work

  14. SPEC Benchmark App DynamoRIO Code Cache BB Trace foo() bar() Cache Cache A A B C D E F Translate application into code cache as it runs

  15. SPEC Benchmark App DynamoRIO Code Cache BB Trace foo() bar() Cache Cache A A C B C D E F Translate application into code cache as it runs

  16. SPEC Benchmark App DynamoRIO Code Cache BB Trace foo() bar() Cache Cache A A C B C D D E F Translate application into code cache as it runs

  17. SPEC Benchmark App DynamoRIO Code Cache BB Trace foo() bar() Cache Cache A A C B C D D E E F Translate application into code cache as it runs

  18. SPEC Benchmark App DynamoRIO Code Cache BB Trace foo() bar() Cache Cache A A C B C D D E E Indirect Branch Lookup F Translate application into code cache as it runs

  19. SPEC Benchmark App DynamoRIO Code Cache BB Trace foo() bar() Cache Cache A A C B C D D E E Indirect Branch Lookup F F Correlate indirect branch targets via hashtable

  20. SPEC Benchmark App DynamoRIO Code Cache BB Trace foo() bar() Cache Cache A A C B C A C D D D E E E Indirect ? Branch F Lookup F F Hot paths are compiled into traces (10% speedup)

  21. Cost ● Translate code ● Build traces Benefit ● Repeated execution of translated code ● Optimized traces – Can beat native performance on SPEC benchmarks

  22. Cost ● Translate code ● Build traces Benefit ● Repeated execution of translated code ● Optimized traces – Can beat native performance on SPEC benchmarks

  23. JIT Compiled Function DynamoRIO Code Cache BB Trace foo() bar() Cache Cache A A C B C A C D D D E E E Indirect ? Branch F Lookup F F What if the target code is dynamically generated?

  24. JIT Compiled Function DynamoRIO Code Cache BB Trace foo() bar() Cache Cache A A C B C' A C D E D E E D' Indirect ? Branch F Lookup F F The code may be changed frequently at runtime

  25. JIT Compiled Function DynamoRIO Code Cache BB Trace foo() bar() Cache Cache A A C B C' A C D E D E E D' Indirect ? Branch F Lookup F F Corresponding translations become invalid

  26. JIT Compiled Function DynamoRIO Code Cache BB Trace foo() bar() Cache Cache A A C B C' A C D E D E E D' Indirect ? Branch F Lookup F F Stale translations must be deleted for retranslation

  27. JIT Compiled Function DynamoRIO Code Cache BB Trace foo() bar() Cache Cache A A C B C' A C D E D E E D' Indirect ? Branch F Lookup F F Stale translations must be deleted for retranslation → “cache consistency”

  28. JIT Compiled Function DynamoRIO Code Cache BB Trace foo() bar() Cache Cache A A C B C' A C D E D E E D' Indirect ? Branch F Lookup F F Stale translations must be deleted for retranslation → How to detect code changes?

  29. Detecting Code Changes on x86 ● Monitor all memory writes

  30. Detecting Code Changes on x86 ● Monitor all memory writes – Too much overhead!

  31. Detecting Code Changes on x86 ● Monitor all memory writes – Too much overhead! ● Instrument traces to check freshness

  32. Detecting Code Changes on x86 ● Monitor all memory writes – Too much overhead! ● Instrument traces to check freshness – DynamoRIO supports standalone basic blocks → too much overhead!

  33. Detecting Code Changes on x86 ● Monitor all memory writes – Too much overhead! ● Instrument traces to check freshness – DynamoRIO supports standalone basic blocks → too much overhead! ● Leverage page permissions and faults

  34. Detecting Code Changes on x86 ● Monitor all memory writes – Too much overhead! ● Instrument traces to check freshness – DynamoRIO supports standalone basic blocks → too much overhead! ● Leverage page permissions and faults – Make code pages artificially read-only

  35. Detecting Code Changes on x86 ● Monitor all memory writes – Too much overhead! ● Instrument traces to check freshness – DynamoRIO supports standalone basic blocks → too much overhead! ● Leverage page permissions and faults – Make code pages artificially read-only – Intercept page faults and invalidate translations

  36. Detecting Code Changes on x86 ● Monitor all memory writes – Too much overhead! ● Instrument traces to check freshness – DynamoRIO supports standalone basic blocks → too much overhead! ● Leverage page permissions and faults – Make code pages artificially read-only – Intercept page faults and invalidate translations → Acceptable overhead (for rare occurrence)

  37. Detecting Code Changes on x86 ● Monitor all memory writes – Too much overhead! ● Instrument traces to check freshness – DynamoRIO supports standalone basic blocks → too much overhead! ● Leverage page permissions and faults – Make code pages artificially read-only – Intercept page faults and invalidate translations → How does this work?

  38. Chrome V8 DynamoRIO Code Cache r-x foo() foo() bar() bar() rwx r-x compile_js() compile_js()

  39. Chrome V8 DynamoRIO Code Cache r-x foo() foo() X bar() bar() Page fault rwx r-x compile_js() compile_js()

  40. Chrome V8 DynamoRIO Code Cache r-x foo() foo() X bar() bar() rwx r-x compile_js() compile_js()

  41. Chrome V8 DynamoRIO Code Cache rwx Allow foo_2() write bar() bar() rwx r-x compile_js() compile_js()

  42. Chrome V8 DynamoRIO Code Cache rwx foo_2() bar_2() bar() rwx Allow compile_more_js() write! Thread B r-x compile_js() compile_js() Thread A

  43. Chrome V8 DynamoRIO Code Cache rwx foo_2() bar_2() bar() rwx compile_more_js() Concurrent Writer Problem All translations from the modifjed page must be removed Thread B r-x compile_js() compile_js() Thread A

  44. Cache Consistency Overhead ● For non-JIT modules: – System call hooks (program startup only) – Self-modifying code (very rare) ● For JIT engines: – Code generation – Code optimization – Code adjustment for reuse

  45. Cache Consistency Overhead ● For non-JIT modules: – System call hooks (program startup only) – Self-modifying code (very rare) ● For JIT engines: – Code generation – Code optimization – Code adjustment for reuse

  46. Cache Consistency Overhead Chrome V8 DynamoRIO Code Cache r-x foo() foo() bar() rwx r-x compile_js() compile_js() JIT writes a second function to unused space in the page

  47. Cache Consistency Overhead Chrome V8 DynamoRIO Code Cache r-x foo() foo() bar() rwx r-x compile_js() compile_js() DynamoRIO must invalidate all translations from the page

  48. Cache Consistency Overhead Chrome V8 DynamoRIO Code Cache r-x foo() foo() bar() bar() rwx r-x compile_js() compile_js() Trivial code changes require flushing all translations

Recommend


More recommend