A Multi-Level Meta-Object Protocol for Fault- Tolerance in Complex Architectures François Taïani ( * ) , Jean-Charles Fabre, Marc-Olivier Killijian LAAS-CNRS ( ( * ): Now at Lancaster University) DSN'2005, The International Conference on Dependable Systems and Networks , Yokohama, Japan, June 28 - July 1, 2005
Motivating Example: Replication & Multithreading Goal : Transparent replication of a CORBA server multi-layer: POSIX (OS) + CORBA (middleware) multithreaded: concurrent processing of requests thread pool: upper limit on concurrency Problem 1 : state capture / restoration replication application state CORBA middleware + OS state OS 2
Motivating Example: Replication & Multithreading Goal : Transparent replication of a CORBA server multi-layer: POSIX (OS) + CORBA (middleware) multithreaded: concurrent processing of requests thread pool: upper limit on concurrency Problem 1 : state capture / restoration replication application state CORBA middleware + OS state OS Problem 2 : control of non-determinism assumption: multi-threading only source of non-determinism how to replicate non-deterministic mutex decisions? 3
Enforcing Determinism: OS Only The same lock allocation can be enforced on all replicas. All replicas reach the same state. Only a small subset of the lock allocations impacts determinism. network application application middleware middleware OS FT OS FT Replication of every up to 203 synch. operations non-deterministic decision per request in middleware highly inefficient (ORBacus) [TAO: 52 , omniORB: 64 ] 4
Smart Multi-Level Reflection With middleware and application semantics: OS-level actions can be given a higher level semantic . This semantic allows optimal use of OS level reflection. application application FT FT middleware middleware OS OS Combining information obtained at different levels greatly Reification of application Only 3 of the synch. operations & middleware activity made by the middleware need increases the efficiency of crosscutting mechanisms. to be replicated (ORBacus). (Here: only 1.5% of MD synch. activity actually needs to be replicated.) 5
The Vision meta-level reflection base-level family of mechanisms application fault-tolerance MOP generic middleware (Meta-Object “glue” Protocol) OS meta-interfaces 6
The Problem How to design & implement such a meta-object protocol? application application fault-tolerance ? middleware OS OS 7
Outline Motivating Example: Reflection and Replication A New Multi-Level MOP: Concepts & Design Practical Application: CORBA & Linux 8
Implementing Multi-Level Reflection Goal : To provide a multi-reflective framework for the fault- tolerance of complex, non-reflective industrial platforms Challenges : Requirements : What kind of information is needed for fault tolerant mechanisms? Where should this information be found? Design : How to design a multi-level meta-object protocol that supports multi-level reflection? Instrumentation : How to instrument an industrial, non- reflective platform in a non-invasive, transparent way? 9
Implementing Multi-Level Reflection Goal : To provide a multi-reflective framework for the fault- tolerance of complex, non-reflective industrial platforms Challenges : Requirements : What kind of information is needed for fault tolerant mechanisms? Where should this information be found? Design : How to design a multi-level meta-object protocol that supports multi-level reflection? Instrumentation : How to instrument an industrial, non- reflective platform in a non-invasive, transparent way? 10
Requirements Meta-interface for non-determinism [DSN-2003] interface MetaRequestLifecycle { /** Communication **/ fault-tolerance requestHasBeenReceived (RequestID); replyHasBeenSent (RequestID); MOP /** Control Path **/ requestBeforeApplication (RequestID); requestAfterApplication (RequestID); /** Synchronisation **/ requestBeforeContentionPoint (RequestID, RequestContentionPoint); requestAfterContentionPoint (RequestID, RequestContentionPoint); }; 11
Requirements Multi-level nature of the meta-interface Request before Request after Appli. ... Application Application fault-tolerance meta-model ... Middleware pre-processing request in post-processing application Request request request Contention Point (locks) ... Reception End Reply Start request reception sending of reply Reception Start Reply End OS 12
Implementing Multi-Level Reflection Goal : To provide a multi-reflective framework for the fault- tolerance of complex, non-reflective industrial platforms Challenges : Requirements : What kind of information is needed for fault tolerant mechanisms? Where should this information be found? Design : How to design a multi-level meta-object protocol that supports multi-level reflection? Instrumentation : How to instrument an industrial, non- reflective platform in a non-invasive, transparent way? 13
Semantics and Architecture Motivating Example : middleware non-determinism request contention points (mutex operations) must be intercepted at OS level but not all mutex operations (otherwise highly inefficient) question : How to distinguish between mutexes that are relevant and those that are not? Proposal : use of semantic context We need to understand the purpose of OS level mutex operations in the more general context of the whole system activity Approach: to trace the computation process that results in a low level OS operation being called 14
Meta-markers To trace semantic contexts, a mechanism is needed to transport information between different abstraction levels (software layers) A mechanism encountered in plants : in periods of droughts the root system communicates with the foliage using dedicated chemical substances call phytohormones Phytohormones travel through the sap Design based on this metaphore. Sap = threads Phytohormones = metamarkers no water 15
Inter-Level Communication with Meta-Markers thread execution path interception dormant meta-marker higher is attached level to thread meta-marker remains meta-marker transparent gets activated and modifies low level system lower behaviour level meta-level base level 16
Using Meta-Markers for MOP Design Meta-markers can be used to design a multi-level MOP Example: synchronisation facet for middleware determinism interface MetaRequestLifecycle { ... /** Synchronisation **/ requestBeforeContentionPoint (RequestID, RequestContentionPoint); requestAfterContentionPoint (RequestID, RequestContentionPoint); }; Two issues to be solved by meta-markers: P1 : the global semantic context of mutex creation must be captured by meta-markers P2 : meta-markers must insure a correct instrumentation of the selected mutexes 17
Capturing Semantics Problem P1 is solved by source code annotation of semantic joint points: init_and_run_middleware (..) { init_and_run_middleware (..) { MutexesAreRelevant metaMarker () ; Mutexes creates here metaMarker .attachToThread() ; are relevant for init_request_queue (..) ; init_request_queue(..) ; determinism metaMarker .detachFromThread() ; init_some_refcount_object (..) ; init_some_refcount_object (..) ; ... ... Mutexes creates here run_ORB (); run_ORB (); are not . } } 18
Meta-Markers as Meta-Mutex Factories middleware meta marker thread execution MutexesAre path Relevant meta-marker creates new a new mutex creation is mutex and intercepted attaches it to a meta-mutex meta-mutex mutex meta-level base level OS newly created mutexes are released into the OS among other non-instrumented mutexes 19
Back to the Meta-Interface interface MetaRequestLifecycle { meta-markers to /** Communication **/ instrument requestHasBeenReceived (RequestID); appropriate sockets replyHasBeenSent (RequestID); /** Control Path **/ meta-markers to requestBeforeApplication (RequestID); transport request IDs requestAfterApplication (RequestID); /** Synchronisation **/ requestBeforeContentionPoint meta-markers to (RequestID, RequestContentionPoint); instrument requestAfterContentionPoint (RequestID, RequestContentionPoint); appropriate mutexes }; 20
Implementing Multi-Level Reflection Goal : To provide a multi-reflective framework for the fault- tolerance of complex, non-reflective industrial platforms Challenges : Requirements : What kind of information is needed for fault tolerant mechanisms? Where should this information be found? Design : How to design a multi-level meta-object protocol that supports multi-level reflection? Instrumentation : How to instrument an industrial, non- reflective platform in a non-invasive, transparent way? 21
Implementation Multilevel interception framework to control non-determinism; 8000 LoC C++; based on CORBA and POSIX only; platform independent. ML-coordination application replication request, contention point ORBacus interception mutex, thread, socket Linux use dependencies 22
Recommend
More recommend