Immediate Multi-Threaded Dynamic Software Updates Using Stack Reconstruction Kristis Makris Rida A. Bazzi {makristis,bazzi}@asu.edu 1
Motivation Software update problem: replace old version with new version Traditional approach is static: – stop, update, restart – Impairs high-availability Dynamic software update (DSU) can help minimize downtime 2
Execution trace Stack main main f g f request h update g g h 3
Execution trace Stack main main f g f request h update g ? g g g h g h f 4
Type-safety: No old code executed on new data; and vice versa new version old version g() is called typedef struct { typedef struct { char name[64]; char name[64]; data are transformed int number_of_accesses; float balance; g() returns; f() executes float balance; } customer_record_v1_t; } customer_record_v2_t; memory alignment memory alignment John John f() writes f() reads incorrect 145.0 0.0 0 balance = 145.0 balance = 0 g() writes 160 145 balance = 160.0 5
Execution trace Stack main main f g f request h update g g h g h f g Is old version in valid state ? h Is there a valid mapping ? g Undecidable problem f – Need user input main Should provide useful safety guarantees 6
Execution trace Stack main main f g f request h update g g h g h f g h g Undecidable problem f – Need user input main f Should provide useful g safety guarantees h update finished 7
Useful DSU Safety Guarantees Atomic update (subsumes type-safety) Transaction-safety Thread-safety 8
Atomic Update old version At no time does the executing application – expect different representations of state After the update only new code executes over – the new state; no old code ever executes again resume pause map state hybrid execution 9 new version
Transaction-safety Some code executes only in old or only in new version Requires user annotations f() { ... while(condition) { i(); j(); do not update inside region k(); } ... } 10
Thread-safety active connections new connections 1. clients server active connections new connections 2. old code active connections while (condition) { new recv(&data); connections 3. process(&data); } new code 11
Providing thread-safety requires immediate updates – Atomic update – Bounded delay Existing DSU mechanisms do not provide support for immediate updates 12
Our Results First general DSU mechanism that supports – Immediate updates Atomic update Bounded delay Multi-threaded – Update active code and data – Low data-access overhead 13
Our Approach: UpStare Compiler, patch-generator, runtime – Insert update points – Source-to-source transformations of C programs – Architecture and OS independent Immediate multi-threaded updates – Atomic update: using stack reconstruction – Bounded delay: converting blocking calls to non- blocking – Multithreaded: safely blocking all threads 14
Stack Reconstruction: unrolling Thread 1 saved frames Thread 2 saved frames Thread 3 saved frames g() m() d() main() i() main() f() k() c() e() _i() b() c() a() block until unroll finishes b() _main() _main() _i() _main() a() _main() a() k() a() b() m() b() c() c() e() d() f() map global variables g() block all threads 15
Stack Reconstruction: restoring Thread 1 saved frames Thread 2 saved frames Thread 3 saved frames g() m() d() main() i() main() f() k() c() e() _i() b() c() a() b() _main() _main() _i_new() _main() a() map _main() copy a() k() a() copy b() m() b() c() p() c() - split functions e() d() - map continuation •Can update: d() •local variables •formal parameters - merge functions together •return addresses 16 apply stack transformers •Program Counter
Continuation Points main() { UPDATE_POINT(); main() a(); c(); _main() g(); } a() { UPDATE_POINT(); b(); } b() { UPDATE_POINT(); d(); while(condition) { UPDATE_POINT(); 17 e(); old version }
Continuation Points main() { UPDATE_POINT(); main() a(); c(); _main() g(); } a() { UPDATE_POINT(); b(); } b() { UPDATE_POINT(); d(); while(condition) { UPDATE_POINT(); 18 e(); old version }
Continuation Points main() { UPDATE_POINT(); main() a(); c(); _main() g(); } a() a() { UPDATE_POINT(); b(); } b() { UPDATE_POINT(); d(); while(condition) { UPDATE_POINT(); 19 e(); old version }
Continuation Points main() { UPDATE_POINT(); main() a(); c(); _main() g(); } a() a() { UPDATE_POINT(); b(); } b() { UPDATE_POINT(); d(); while(condition) { UPDATE_POINT(); 20 e(); old version }
Continuation Points main() { UPDATE_POINT(); main() a(); c(); _main() g(); } a() a() { b() UPDATE_POINT(); b(); } b() { UPDATE_POINT(); d(); while(condition) { UPDATE_POINT(); 21 e(); old version }
Continuation Points main() { UPDATE_POINT(); main() a(); c(); _main() g(); } a() a() { b() UPDATE_POINT(); b(); d() } b() { UPDATE_POINT(); d(); while(condition) { UPDATE_POINT(); 22 e(); old version }
Continuation Points main() { UPDATE_POINT(); main() a(); c(); _main() g(); } a() a() { b() UPDATE_POINT(); b(); } b() { UPDATE_POINT(); d(); while(condition) { UPDATE_POINT(); 23 e(); old version }
Continuation Points main() { UPDATE_POINT(); main() a(); c(); _main() g(); } a() a() { b() UPDATE_POINT(); b(); Saved continuation points: } b_CP_3 b() { UPDATE_POINT(); // CP 1 d(); // CP 2 initiate an update while(condition) { UPDATE_POINT(); // CP 3 24 e(); // CP 4 old version }
Continuation Points main() { UPDATE_POINT(); main() a(); c(); _main() g(); } a() a() { UPDATE_POINT(); // CP 1 b(); // CP 2 Saved continuation points: } b_CP_3 a_CP_2 25
Continuation Points main() { UPDATE_POINT(); // CP 1 main() a(); // CP 2 c(); // CP 3 _main() g(); // CP 4 } Saved continuation points: b_CP_3 a_CP_2 _main_CP_2 26
Continuation Points main() { UPDATE_POINT(); main() a(); // CP 2 c(); _main() g(); } Saved continuation points: b_CP_3 a_CP_2 _main_CP_2 Restored continuation points: _main_CP_2 27 new version
Continuation Points main() { UPDATE_POINT(); main() a(); // CP 2 c(); _main() g(); } a() a() { UPDATE_POINT(); b(); // CP 2 Saved continuation points: } b_CP_3 a_CP_2 _main_CP_2 Restored continuation points: _main_CP_2 a_CP_2 28 new version
Continuation Points main() { UPDATE_POINT(); main() a(); c(); _main() g(); } a() a() { b() UPDATE_POINT(); b(); Saved continuation points: } b_CP_3 b() { a_CP_2 UPDATE_POINT(); // CP 1 _main_CP_2 d(); // CP 2 Restored continuation points: f(); // CP 3 _main_CP_2 while(condition) { a_CP_2 UPDATE_POINT(); // CP 4 29 b_CP_4 e(); // CP 5 new version }
Multi-Threaded Updates Thread 1 Thread 2 LOCK(L); WANTS lock L LOCK(L); HAS lock L •Detect if all threads are blocked •Treat locks as update points UPDATE_POINT(); voluntarily blocked blocked Multi-Process Updates •wrap fork(), wait(), waitpid() •coordinate atomic reconstruction 30
Bounded Delay functionA() { char data[SIZE]; RECV(FD,&data); ... issue non-blocking recv(FD, &data); voluntarily block ... } SELECT(FD); stack heap SOME DATA XXXXXXXXX SOME DATA SOME DATA SOME DATA XXXXXXXXX MORE DATA 0000000000 MORE DATA 0000000000 UPDATE_POINT(); XXXXXXXXX 0000000000 0000000000 data buffer save 0000000000 0000000000 XXXXXXXXX 0000000000 0000000000 0000000000 0000000000 restore YYYYYYYYYYY other data ZZZZZZZZZZ RECV_FINISH(FD,&data); 31
Evaluation KissFFT – Small, data-intensive application (2,000 LoC) Very Secure FTP Daemon – Medium-sized application (12,000 LoC) – Forks non-communicating connection handlers PostgreSQL DBMS – Large application (postmaster: 225,000 LoC) – Forks communicating connection handlers Shared Memory 32
KissFFT v1.2.0 Overhead – Execution time: 38%; faster than Ginseng (150%) code for desired restoring optimization functionA code for unrolling functionB code for update points outside .text uninstrumented instrumented segment 33 version version
Very Secure FTP Daemon Updates – Under two use cases: idle client, file transfer – 13 updates (5.5 years-worth) – 11 manual continuation mappings – Latency 60ms (50ms to block all threads) Overhead – Latency: retrieve a 32-byte file 1000 times – Throughput: retrieve a 300MB file – In-memory and on-disk, over cross-over cable 34
vsFTPd Overhead Latency: 16-17% (1.6ms); throughput: 0% 35
PostgreSQL DBMS Updates – 1 update: v7.4.16 to v7.4.17 – No manual continuation points; latency 60ms Overhead – Latency: run 1 transaction 1000 times – Throughput: “TPC-B like” pgbench; 100,000 txs – In-memory and on-disk, over cross-over cable 36
PostgreSQL Latency Latency: 89-97% (22.5ms) 37
PostgreSQL Throughput Throughput: 41% in-memory; 26% on-disk 38
Recommend
More recommend