world 201 1 help
play

World 201 1 Help! Problem Solving and Troubleshooting Daniel - PowerPoint PPT Presentation

World 201 1 Help! Problem Solving and Troubleshooting Daniel Rodwell Australian National University XW11 Intro Outline Todays Session Two Parts Problem Solving Concepts and Theory Methods Group Solve Troubleshooting


  1. Finding the Pieces Order in chaos Ways ‘pieces’ of the problem become obvious (things to look for): • Natural Grouping • Functional or Procedural Grouping • Modular • Derived from First Principles or Architecture XW11

  2. Funnel Method Loosely Defined Problem Recall: • Broad, non-specific goals • Ideal-based • Experimental / Trial / Future Projects • The problem may not be fully understood, and solution options are completely unknown. XW11

  3. Funnel Method Loosely Defined Problem Inputs: • new or unproven Ideas • parallel prototyping (project bake-off) • experimentation and discovery Output: – Evolutionary goal – The best solution (progressive) XW11

  4. Funnel Method Lots of Ideas Concept generation Gate B A D C Modular A B C D Grouping A B Solution Bake off XW11

  5. Group Solve

  6. Group Solve Solve for X - Likely to encounter this scenario in your organisation - Problems progressively revealed as you traverse the scenario - individually / pair up & think of the problem – and how you might start to solve it – modules / categories / attributes XW11

  7. Scenario < scenario removed > XW11

  8. Why Problem Solving Hurts Ouch • If it was easy, you’d have solved it already • It typically involves learning new stuff, while simultaneously developing a solution • Chances are you will not immediately know the answer. • You’re under pressure. XW11

  9. Constraints Fixed vs. imposed Constraints • Some constraints will be fixed and are physically determined. – ie. Cable breaking strain of 1200KG • Other constraints are imposed or we unintentionally limit ourselves with prior convention. Think outside of the problem as well. • is the problem part of a bigger picture? XW11

  10. Consider this Imposed Constraint You are here XW11

  11. Consider this Down under (& NZ too) is on top XW11

  12. No! It’s all wrong. Why? N Someone decided North goes at the top. XW11

  13. No Problems I’m awesome, No problems here. ... yet Discover weaknesses in your systems • use same approaches • module by module analysis • understand what ‘normal is for your system’ • understand utilisation and capacity • If you do have a problem, you’ll know how each module normally behaves XW11

  14. Part 2. Troubleshooting

  15. Troubleshooting Concepts

  16. What is Troubleshooting? Dictionary says... troubleshoot |ˈtrəbəlˌ sh oōt| verb [ intrans. ] [usu. as n. ] ( troubleshooting ) solve serious problems for a company or other organization. – trace and correct faults in a mechanical or electronic system. XW11

  17. What is troubleshooting? Applied Problem Solving XW11

  18. Inherit: Problem Solving methods It’s reusable Core points retained • Define what the issue is • Understand what you are trying to fix • Break the issue down into smaller parts XW11

  19. Types of Failure 3 Common Types Technical Failures usually fall into three top level categories – Bogus (there is no failure) – Outright (it’s dead) – Intermittent (the most problematic) XW11

  20. Influences Influences on Troubleshooting accuracy • Quality of Symptom description • Symptoms often do not have a 1:1 correlation with failure mode • Data may be incorrect XW11

  21. How not to fail The most important part Symptom Description • An accurate and concise Symptom Description is critical to your troubleshooting success • Without an accurate Symptom Description – You’ll be chasing the wrong thing – It’ll be unclear where to start XW11

  22. Symptom Description It’s easy to spot a bad one It’s dead. It doesn’t work. There’s something wrong with my computer. I can’t download the internet. XW11

  23. A System and its parts Any ‘System’ is a collection of modules • It’s normally a module that breaks, not the entire system • A web server is a system - I/O, network, authentication, db, content, config • A washing machine is a system - pump, motor, controller, valves, sensor XW11

  24. Accurate Troubleshooting Report of System Failure Verification or Replication of fault where there is an actual, verifiable fault locate the faulty module within system Fix only the faulty module or part Return Correctly functioning system to operational status XW11

  25. What is Troubleshooting Sequential Fact Building Loosely Defined Symptoms Progress through the troubleshooting process should Fault Verified – reduce the uncertainty – progressively isolate the modules Module isolation – increase the number of known Cause states XW11

  26. Fact Building Symptom Gathering Administrator asks probing questions User reports of problems and Priming Data description Normal Statistics Log Files Symptoms Error Reports Loosely Defined Symptoms Symptom Verification Bogus Isolation Fault Verified Uncertainty Facts decreasing Increasing Module Module identification isolation Cause Solution Cause XW11

  27. Feedback Concept We like to know whats going on Humans like feedback in the form of progress. We like to know that our interactions are changing the environment we are attempting to influence. It gives us the sense of “getting somewhere” . XW11

  28. Feedback Concept Managers are human too Managers are human too (!) Uninformed managers can become a larger problem than the technical issue you are trying to resolve. XW11

  29. Feedback Concept Keep it in mind When determining the steps you are going to take in your troubleshooting task: • keep in mind the result you are looking for at each step • and what result a normal, correctly operating module would return. • If you have progressive results, you can keep others informed. – ie, we’re ruled X out, established Y is working, need to test Z. XW11

  30. Why Feedback Matters Consider this A theoretical moving car Input Process Output Steering Angle Wheels turn Change in Direction Feedback: Visual Recognition Sensory Feedback (g-force) XW11

  31. Feedback Delayed Feedback altered A theoretical moving car Input Process Output Steering Angle Wheels turn Change in Direction Feedback: 30sec Visual Recognition Sensory Feedback (g-force) XW11

  32. Feedback Removed Feedback altered A theoretical moving car Input Process Output Steering Angle Wheels turn Change in Direction Feedback: X Visual Recognition Sensory Feedback (g-force) XW11

  33. Oh no! You crashed and burned. Why? • Multiple wrong inputs • Situation becomes progressively worse • progress is unknown Each Troubleshooting stage should result in usable information. • Even if that is “this part works as expected” . • You now have one less module to isolate. XW11

  34. Troubleshooting Methodologies

  35. Gather info and verify First Steps • Gather info • Verify situation against information • Establish a baseline of a correctly operating system • Rule out really obvious factors – Storage full, No IP address, No AC input, etc. XW11

  36. Brute-Force Guesswork Troubleshooting Methodologies Brute-force Guesswork – Belief based – Evidence poor MLB Battery – Procedurally inadequate Housing Display variable – highly uncertain if correct cause identified certain / HDD – occasionally works for some experienced uncertain state techs. Common cause of “it must be this part” . Unfixable XW11

  37. Brute-Force Guesswork Methodology MLB Battery Display Housing variable certain / uncertain state HDD Unfixable XW11

  38. Split-Half Troubleshooting Methodologies Split-Half – Eliminate half of the probable cause at each System level X – Requires understanding of common issues Hardware Software – Requires understanding of core functions of X each function area or differentiating Function Graphics Memory behaviour isolation X – highly structured, complete but can be time GPU Display consuming and indirect if starting point is vague. – Works best for isolate/verify function areas where there is no obvious likely cause XW11

  39. Split-Half Methodology System X Software Hardware X Function Graphics Memory isolation X GPU Display XW11

  40. Power / Signal Flow Troubleshooting Methodologies Power / Signal Flow AC - IN – Follow Signal sequence through system signal flow – Highly sequential, must be performed in PSU order loom – effective for “no X” or “dead” symptoms – often places core modules early in the PWR BTN MLB / SMC troubleshooting, even if they may be a less PROC RAM likely cause. – Requires understanding of signal flow in Speaker Audio Controller system architecture. SATA PCI XW11

  41. Power / Signal Flow Methodology signal flow AC - IN PSU loom PWR BTN MLB / SMC PROC RAM Speaker Audio Controller SATA PCI XW11

  42. Likely Cause Troubleshooting Methodologies Likelihood Likely Cause Identification decreasing – Use known likely causes as starting point Bogus – can often be reordered to promote more Config likely causes, demote less likely cause – works best where Software – it is possible to identify all sources of possible Fan cause – there are few causes Sensor – or the causes are well known MLB – less suitable for cases where there is no obvious cause XW11

  43. Likely Cause Methodology Bogus L i k e l i h o o d Config d e c r e a s i n g Software Fan Sensor MLB XW11

  44. Likely Cause + Weighted Matrix Troubleshooting Methodologies Weighted Matrix – Use to assist prioritising the Likely Cause Possible Possibly Isolation Order Likelihood Cause Bogus Priority isolation order High, 1 Possible Cause A High Yes Dependencies – Promotes more likely / relevant isolation HIGH 2 High, Possible Cause B Low Yes tests for the scenario Dependencies MID 3 Possible Cause C Low No Low – Demotes less likely causes LOW – Use to correctly “weight” troubleshooting priority. XW11

  45. Likely Cause + Weighted Matrix Methodology Possible Cause Likelihood Possibly Bogus Isolation Priority Possible Cause A Possible Cause B Possible Cause C XW11

  46. Likely Cause + Weighted Matrix Methodology Possible Cause Likelihood Possibly Bogus Isolation Priority Possible Cause A High Yes High, Dependencies Possible Cause B Low Yes High, Dependencies Possible Cause C Low No Low XW11

  47. Likely Cause + Weighted Matrix Methodology Derived Possible Cause Likelihood Possibly Bogus Isolation Priority Order 1 Possible Cause A High Yes High, Dependencies HIGH RANK 2 Possible Cause B Low Yes High, Dependencies MID RANK 3 Possible Cause C Low No Low LOW RANK XW11

  48. Minimal Config Troubleshooting Methodologies Minimal Config – The Final Frontier – Saviour when all else fails Core Components – Highly time consuming, Module A Next Component Next Component + – but high accuracy Module D Module E Module B Test Test + ok? ok? – Must know what components are the Module C absolute minimum for the system start XW11

  49. Minimal Config Methodology S y s t e m B u i l d U p + Core Components Module A Next Component + Module D Module B T e s t + o k ? Module C R e - t e s t XW11

  50. Minimal Config Methodology S y s t e m B u i l d U p + + Core Components Module A Next Component Next Component + Module D Module E T e s t Module B T e s t T e s t o k ? + o k ? o k ? Module C R e - t e s t R e - t e s t XW11

Recommend


More recommend