The Rainbow System Manager Alarm Correlation Engine • What is the RSM Alarm Correlation Engine (RACE)? Fault Diagnosis • The Virtual Entity (VE) and Virtual Entities in the • RACE design objectives Rainbow System Manager • RACE design description including: – application architecture – knowledge structures – inferencing mechanisms Tony White, Niall Ross • Example scenario walk-through System and Software Engineering, HK00 • Summary HALO2, T4E 742-3848 The Alarm Correlation Engine The Virtual Entity Network Black Board Problem event Alarm stream notifications M-Protocol Correlation (full) Engine MA : PMO RSM M-Protocol Compute Platform "Light" VE correlation Knowledge base(s) communities RDSC RTC • The Alarm Correlation Engine takes network event Compute notifications e.g., alarms and generates a problem Platform MO stream from it by inferencing over VE correlation communities in the RSM through the use of one Logical VE or more knowledge bases
Design Objectives Design Philosophy Problem B msgAB • Provide an architecture that is flexible i.e., alternate Problem A ruleB1 reasoning paradigms can be easily integrated ruleA1 msgAC ruleB2 ruleA2 msgCB ... • Generate a rule-based framework capable ... of having rules encoded directly in the OO language Problem C ruleC1 of implementation ... • The design should provide an easily-extensible framework in order that other NT products’ event • A Problem-based approach is adopted; with a problem correlation needs can be met mapping to a fault on a managed object in the network • Problem objects communicate with each other with • Provide strongly hierarchical knowledge structuring messages, in well-defined communities mechanisms in order that the scalability • Problem objects process messages received from and performance issues can be addressed other problem objects using rules • Allow for knowledge reuse between product knowledge The Alarm Correlation Engine is a Hybrid Rule bases and within the elements of a single product and Message Passing System knowledge base Application Architecture AC Engine Description: VE class Symbolic • The AC engine requires the extension of the VE Alarm Debugger to include a specification of fault behavior Correlation Problem Problem Engine Browser Browser • Each VE class now has a set of problem classes Controller Event associated with it. Only the class names are added Generator to the VE definition and (might) be placed in the ‘F’ area Black Board of the FCAPS specification of the VE. An example of the structure used in the AC engine prototype is shown below Event • e.g. ve_class(lc, [lc_problem_class]) Notifications • NOTE: multiple problem classes can be defined for a Alarm Alarm VE class and VE classes can share problem classes Correlation Correlation Engine Engine Reuse of problem class information is supported MO MO by the design Production System Verification System
AC Engine Description: AC Engine Description: Rule Types Problem Class • Multiple rules can be defined of a given type – the AC engine evaluates these rules in order when determining • Problem classes comprise: the effects of a network event notification or capability – a name and change notification – an ordered collection of RuleSets • Rules can appear as arguments of multiple types • RuleSets are used to match rules with messages: The advantages of this design are: – AlarmNotification(rule_name) – ProblemStateNotification(rule_name) • Only rules appropriate to a given class of notification – ProblemNotification(rule_name) are evaluated implying improved real time performance – PropositionNotification(rule_name) • The knowledge base designer has control over the – DeletedProblemNotification(rule_name) order in which rules are evaluated; the order in which rules are defined is unimportant – this implies easier • RuleSets are defined for Problem classes maintenance and instances • Rules reuse between types and even problem classes Problem Description: Rule Definition RuleBases and CompiledRuleBases • Rules are methods, but with an enhanced Smalltalk • RuleBases are classes containing rules, with the rules syntax being coded as instance and class methods • CompiledRuleBases are classes containing compiled • Rules consist of three distinct elements: rules, with the compiled rules coded as instance and – a name class methods – a conjunction of a set of conditions i.e., boolean expressions • RuleBase classes form a hierarchy such that rules – a set of actions in one class can be overloaded in a subclass • Any piece of Smalltalk code can be embedded in a rule – rule actions are not limited in any way RuleBase A RuleBase B • The complete power and wealth of the Smalltalk class rule 2, rule3, ... rule 1, rule 2... library and encoded BNR applications is thus available is subclass of to the knowledge base designer
The Problem - RuleBase Relationship Correlation Communities Problem A indirect relationship Correlation communities are sets via ProblemRuleBaseMapper of components that interact in rule1 ATM link1 order to provide some service rule2 Sonet1 or services. LC2 RuleBase X Correlation communities can Community communicate in one of two ways. Problem B Firstly, components within the NotePad LC1 rule1 community may post to and read rule2 from the community notepad. rule2 rule3 Secondly, they may communicate AX1 AX2 with other community members rule3 by such interaction paths as are defined. Problem C rule4 Legend For example, in a capability- Interactor chain links managed system, these interac- rule3 tion paths would be the capability Community NotePad link rule4 chain links. Virtual Managed Entity Inferencing: Direct Communication Inferencing: Broadcasting ATM link1 ATM link1 Sonet1 Sonet1 LC2 LC2 Community Community NotePad LC1 NotePad LC1 Event Event AX2 AX1 AX2 AX1 • Event notification arrives for processing at a VE • Problem class or instance rules cause changes • Event arrives at one VE which is indicative of a problem in capability or problem state elsewhere in the capability chain • Changes are propagated to consumers of VE • Information is broadcast to all VEs in the community capabilities which, in turn, report these changes via the community notepad to their immediate capability suppliers
Fault Scenario I Fault Scenario II ATM notepad connections omitted link notepad connections omitted ATM link ATM module ATM module ATM module ATM module 1 SONET 1 2 3 Prot. Prot. SONET 3 Prot. Prot. 5 2 3 4 3 5 LC 1 Community LC 2 LC 2 LC 1 NotePad 4 4 LC 1 LC 2 Community LC 1 LC 2 NotePad 3 3 AX 1 AX 2 AX 1 AX 2 4 AX 1 AX 2 AX 1 AX 2 A sonet framing alarm arrives at rhs LC 2 (1) . It posts the event(2). The lhs LC 1 fire rules for event (3), creating a problem and causing a problem notifi- cation to be posted on the note pad (4). Upon seeing (3) and “knowing of the A sonet framing alarm arrives on the rhs LC 1 (1) which posts the event on the previous sonet alarm” the lhs AX 1 generates a problem instance causing a note pad (2). The SONET and lhs LC 1 members fire rules based upon event problem notification to be posted on the note pad (4). Also, LC 2 generates a (3), both creating problem instances and causing problem notifications to be problem instance, causing a problem notification (4). The AX 1 problem noti- posted on the note pad (4). Upon seeing event (3) the lhs AX 1 card stores the fication (5) is seen by LC 1 and LC 2 firing rules that cause the deletion of prob- fact that the rhs LC 1 has seen a sonet failure. lems on those two components. Summary • An implementation of an alarm correlation engine capable of complex alarm correlation has been achieved • A hybrid rule and message passing system has been built • Complete separation of problem and rule knowledge has been achieved • Problem class and rule reuse are supported
Recommend
More recommend