event analysis and trends

Event Analysis and Trends James Merlo, Associate Director of Human - PowerPoint PPT Presentation

Event Analysis and Trends James Merlo, Associate Director of Human Performance James Merlo, Associate Director of Human Performance Reliability Issues Steering Committee January 24, 2013 Topics Events Analysis (EA) Process Update on

  1. Event Analysis and Trends James Merlo, Associate Director of Human Performance James Merlo, Associate Director of Human Performance Reliability Issues Steering Committee January 24, 2013

  2. Topics • Events Analysis (EA) Process • Update on Reported Events to Date U d t R t d E t t D t • Cause Coding Process • Initial Trends and Clusters • Initial Trends and Clusters • Preliminary Analysis of Energy Management System (EMS) failures • Questions 2 RELI ABI LI TY | ACCOUNTABI LI TY

  3. Event Analysis Program • Events in 2012  (Cat 1 Cat 5) Events = 111  (Cat 1 ‐ Cat 5) Events = 111 o Cat 1 = 73 events o Cat 2 = 33 events o Cat 3 = 3 events o Cat 3 = 3 events o Cat 4 and 5 = 1 each • Total Events (October 2010 – December 2012)  (Cat 1 ‐ Cat 5) Events = 255 o Events closed = 236 (92.5 percent) o Closed events Cause Coded = 202 (85.6 percent) • 2013 cause coding collaboratively with Regions o EA report quality improving o Providing analysis to NERC Committees o Providing analysis to NERC Committees 3 RELI ABI LI TY | ACCOUNTABI LI TY

  4. Event Characteristics • There are 80 different Characteristics of Events that NERC tracks in 9 Major Categories: tracks in 9 Major Categories:  Natural Events (lightning, hurricanes, etc.)  Entity Operations (Switching, Maintenance, etc.)  Controls/Communication (EMS, SCADA, ICCP Data, etc.)  Industry Alerts (pubic appeals, EEA, etc.)  System Tools (Load Management Tools, etc.)  Infrastructure Security (Vandalism, Sabotage, Theft, etc.)  Failed Equipment (Relays, Splice, Transformer, etc.)  System Conditions (Transmission Generation Loss of load etc )  System Conditions (Transmission, Generation, Loss of load, etc.)  Miscellaneous (Software, Vendors, Mis ‐ operations, etc.) 4 RELI ABI LI TY | ACCOUNTABI LI TY

  5. Generation and Transmission Events Transmission only Generation only Both T & G 165 Events 221 Events 133 Events involve Generation and Transmission 88 Events just 88 Events just 32 Events just Transmission Generation 5 RELI ABI LI TY | ACCOUNTABI LI TY

  6. Mis-Operations and Transmission Events Transmission 88 88 Loss with Mis ‐ 40% 133 Ops Transmission Transmission 60% 60% Loss without Mis ‐ Op 6 RELI ABI LI TY | ACCOUNTABI LI TY

  7. Root Cause Determinations A2 ‐ Equipment/ M t Material Problem i l P bl 7 RELI ABI LI TY | ACCOUNTABI LI TY

  8. Deeper Dive into Organizational I ssues (Based on Root Cause) A4 – Management Challenges B3C08 ‐ job scoping did not identify special circumstances or conditions or conditions B5C04 ‐ risks/consequences associated with change not adequately reviewed B1C03 ‐ direction created insufficient awareness of impact C03 d ect o c eated su c e t a a e ess o pact of actions on safety/reliability B1C04 ‐ follow ‐ up did not identify problems B1C05 ‐ assessment did not determine cause of previously event or known problem 8 RELI ABI LI TY | ACCOUNTABI LI TY

  9. All Causes for Management I ssues/ Challenges 16 A4 – Management Challenges B1C05 ‐ assessment did not determine cause of 14 previously event or known problem B3C08 ‐ job scoping did not identify special 12 circumstances or conditions B5C03 ‐ inadequate vendor support of change 10 B5C04 ‐ risks/consequences associated with B5C04 ‐ risks/consequences associated with change not adequately reviewed 8 B1C08 ‐ corrective action responses to a known or repetitive problem was untimely 6 6 B5C05 ‐ system interactions not considered B1C04 ‐ follow ‐ up did not identify problems 4 2 0 9 RELI ABI LI TY | ACCOUNTABI LI TY

  10. All Causes for Management I ssues/ Challenges 16 A4 – Management Challenges 14 B1C05 ‐ assessment did not determine cause of previous event or known 12 problem B3C08 ‐ job scoping did not identify special 10 circumstances or conditions i di i B5C03 ‐ inadequate vendor support of 8 change B5C04 ‐ risks/consequences associated with B5C04 risks/consequences associated with 6 change not adequately reviewed B1C08 ‐ corrective action responses to a 4 known or repetitive problem was untimely 2 B5C05 ‐ system interactions not considered B1C04 ‐ follow ‐ up did not identify problems 0 A4B1C05 A4B3C08 A4B5C03 A4B5C04 A4B1C08 A4B5C05 A4B1C04 A4B1C05 A4B3C08 A4B5C03 A4B5C04 A4B1C08 A4B5C05 A4B1C04 10 RELI ABI LI TY | ACCOUNTABI LI TY

  11. Summary • Less than adequate Job Scoping is a threat to reliability • Less than adequate EA can result in real threats not being • Less than adequate EA can result in real threats not being remediated • Event Analysis process has potential to provide high quality y p p p g q y reliability information • QA/QC of submitted EA reports 11 RELI ABI LI TY | ACCOUNTABI LI TY

  12. Preliminary Analysis of EMS Outages General: • Category 2b Event C t 2b E t • 46 events (October 26, 2010 – June 27, 2012) • 35 entities reporting with nine (9) entities ( ) experiencing multiple outages • Complete outages range: 32 to 253 minutes C l t t 32 t 253 i t • Partial outages range: 23 to 242 minutes 12 RELI ABI LI TY | ACCOUNTABI LI TY

  13. Analysis of Outage Times 300 270 M Mean Full EMS Outage: 59 Minutes F ll EMS O t 59 Mi t 240 Minutes Mean Partial EMS Outage: 36 Minutes 210 Mean Total EMS Outage: 95 Minutes Mean Total EMS Outage: 95 Minutes 180 180 ge Times in 150 120 Outag 90 60 30 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 Full EMS Outage Partial EMS Outage Mean Full Outage Time Mean Partial Outage Time Mean Total Outage Time 13 RELI ABI LI TY | ACCOUNTABI LI TY

  14. Outage Times by Date 300 October 2010 – June 2012 270 240 210 utes 2010 2011 2012 e Time in Min 180 150 Outage 120 90 60 60 30 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 Full EMS Outage Partial EMS Outage 14 RELI ABI LI TY | ACCOUNTABI LI TY

  15. Root Cause 8 Top Root Causes 7 A2B6C07 – Software Failure A2B6C07 – Software Failure A1B4C02 – Testing of Design/Installation LTA 6 AZ – Information to determine cause LTA A4B5C04 – Inadequate risk assessment of change A4B5C04 Inadequate risk assessment of change 5 A2B3C03 – Post modification testing LTA 4 A4B3C08 – Insufficient Job scoping 3 3 2 1 0 15 RELI ABI LI TY | ACCOUNTABI LI TY

  16. Contributing Causes 25 Top Contributing Causes A2B6C07 – Software Failure A1B2C01 – Design output scope LTA A4B5C03 – Inadequate vendor support of change 20 A1B4C02 – Testing of Design/Installation LTA A2B6C01 – Defective or failed part A4B5C05 – System Interactions not considered A4B5C05 System Interactions not considered 15 15 A4B5C04 – Inadequate risk assessment of change A2B3C02 – Inspection/Testing LTA A2B3C03 – Post Modification Testing LTA 10 10 A3B3C01 – Attention given to wrong issues A4B1C08 – Untimely corrective actions to known issue 5 0 07 01 03 02 01 05 04 02 03 01 08 08 01 13 02 AX 05 01 01 01 05 04 B2 08 B3 08 09 09 08 01 B1 B2 A1 B1 01 09 01 02 02 05 04 02 04 B1 04 06 09 07 11 05 B5 01 02 A5 01 03 05 01 06 B3 A2B6C0 A1B2C0 A4B5C0 A1B4C0 A2B6C0 A4B5C0 A4B5C0 A2B3C0 A2B3C0 A3B3C0 A4B1C0 A1B2C0 A3B2C0 A4B5C1 A7B1C0 A A1B2C0 A2B2C0 A2B3C0 A3B1C0 A3B2C0 A3B3C0 A4B A4B2C0 A4B A4B3C0 A4B3C0 A4B5C0 A5B2C0 A5B4C0 AXB AXB A A1B A1B1C0 A1B2C0 A1B3C0 A1B3C0 A2B1C0 A2B6C0 A3B1C0 A3B2C0 A3B2C0 A4B A4B1C0 A4B1C0 A4B1C0 A4B2C0 A4B3C1 A4B4C0 A4B A4B5C0 A4B5C0 A A5B1C0 A5B1C0 A5B1C0 A5B3C0 A5B4C0 A6B 16 RELI ABI LI TY | ACCOUNTABI LI TY

  17. Common Themes Common themes: 1 1. S ft Software Failures F il 2. Software Configuration/Installation/Maintenance 3. 3 Hardware Failures Hardware Failures 4. Hardware Configuration/Installation/Maintenance 5. 5. Failover Testing Weaknesses Failover Testing Weaknesses 6. Testing Inadequacies 7. Less than Adequate Situational Awareness 17 RELI ABI LI TY | ACCOUNTABI LI TY

  18. Going Forward Actions to date: 1 1. Preventable EMS and SCADA Events Alert – April 10, 2012 Preventable EMS and SCADA Events Alert April 10 2012 2. Brief Event Analysis Subcommittee Leadership ‐ December 3, 2012 3. Brief EAS at Operating Committee (OC) Meeting ‐ December 11, 2012 4. Collaboration with EAS EMS Task Force – January 10, 2013 Next steps: 1. More focused analysis 1 M f d l i 2. Quantifying Risk of EMS outages 3 3. Update to EAS, OC and Reliability Issues Steering Committee Update to EAS OC and Reliability Issues Steering Committee 4. Develop interventions or remediation strategies 18 RELI ABI LI TY | ACCOUNTABI LI TY



More recommend