analysis of survival times using bayesian networks
play

Analysis of Survival Times Using Bayesian Networks Helge Langseth - PowerPoint PPT Presentation

Analysis of Survival Times Using Bayesian Networks Helge Langseth Presented at ESREL 98 Trondheim, Norway, 16-19 June 1998 NTNU Two types of statistical models Neyman categorises statistical models into two groups: Interpolating


  1. Analysis of Survival Times Using Bayesian Networks Helge Langseth Presented at ESREL ‘98 Trondheim, Norway, 16-19 June 1998 NTNU

  2. Two types of statistical models Neyman categorises statistical models into two groups: • Interpolating models Used merely to capture rough effects in the data • Explorative models Used to explore the underlying process which generates the data we have observed NTNU Slide no.: 2

  3. Scope With a database as a starting point, we want to build an explorative model to pinpoint how to reduce the rate of critical failures in a system components. Our main goal is to build a model to gain understanding about how the covariates contribute to the system’s survival times. NTNU Slide no.: 3

  4. The History of Graphical Models • Graphical models in statistics can be dated back to Wright’s notation in 1921. • The calculation complexity did however, render the Bayesian Networks neglected for 60 years • In the 1980’ties, effective algorithms for exact calculations on graphs, and later on computer intensive methods like Markov-Chain Monte- Carlo brought the Bayesian Networks back into the light, and up on the Top 5 Statistical Buzz- Word of the Week . NTNU Slide no.: 4

  5. Bayesian Networks Age Gender Exposure Smoking To Toxic Cancer Serum Lung Calcium Tumour NTNU Slide no.: 5

  6. Conditional Independence Age Gender Cancer is Exposure independent of Smoking To Toxic Age and Gender given Exposure To Toxic and Cancer Smoking Serum Lung Calcium Tumour NTNU Slide no.: 6

  7. “Fundamental Theorem” Every multidimensional statistical distribution function can be represented by a Bayesian Network. n ∏ = ( , ,..., ) ( | , ,..., ) f x x x f x x x x − 1 2 1 2 1 n i i = 1 i n ∏ = ( |" All predecesso rs" ) f x i = 1 i 1 2 3 n 4 NTNU Slide no.: 7

  8. Nodes are Probability Tables Gender Age Exposed Age To Toxic In Not In Exposure Smoking Material (25,65) (25, 65) To Toxic 5 % 1% True 95 % 99% False Cancer Serum Lung Calcium Tumour NTNU Slide no.: 8

  9. Where do the Networks come from? Situation: We want to build a model to analyse a multidimensional vector X. Aid: To do so, we have N i.i.d. realisations of X , x 1 , …, x N AND / OR a domain expert. Unknowns: • The network structure • The parameters in the local node tables NTNU Slide no.: 9

  10. Generating Networks: • Initialize Network repeat • Propose some Change to the structure • Fit Parameters to the new structure • Evaluate the new network according to some measure (like BIC, AIC, MDL) • If the New network is Better than the previous, then Keep the Change until Finished NTNU Slide no.: 10

  11. Bayesian Networks are used in: • In “expert systems”, mostly in medical domains (e.g. the MUNIN system) • In decision support systems (e.g. for NASA ) • In analysis of dynamic systems (e.g. speech recognition, the BAT-Mobile ) • … NTNU Slide no.: 11

  12. Bayesian Networks, Summary: • An estimate of the multidimensional density • Easy to understand for non-statisticians (e.g. a domain expert) • The representation is optimized for tasks like – Prediction – Classification – Decision support • Can incorporate prior domain knowledge: – “Top down analysis”: Expert knowledge – “Bottom up” analysis: Data driven system verification NTNU Slide no.: 12

  13. Reliability Analysis • Data-set: 219 Gas-Turbines with 2921 failures and 300 censored survival times from the OREDA-IV database • Each failure is described by ten covariates, e.g., System Type , Manufacturer , Actual/Planned PM ,... • We have special interest in Time To Fail and Failure Severity ( Critical or Degraded ) • Problem to solve: “How can we reduce the frequency of critical failures?” NTNU Slide no.: 13

  14. Generated Network System Installation Location Code Code Design Environ- Operating Class ment Mode Severity Manufact. Class Planned Sub unit Actual PM PM Time to Fail NTNU Slide no.: 14

  15. “Clique” Graph Environ. Location System Severity Subunit TTF PM Location: PM: Installation Planned PM Code Location Actual PM System: Environment: System Code Installation Operating Mode Code Manufacturer Environment System Code Design Class NTNU Slide no.: 15

  16. Model Verification 5000 4000 Cox regression 3000 2000 1000 0 0 1000 2000 3000 4000 5000 Bayesian network NTNU Slide no.: 16

  17. Conclusions • We have generated a Bayesian Network to analyse a data-set from the OREDA IV database. • The Bayesian Network enabled both Qualitative and Quantitative analysis of the data-set. • To verify the calculations, the numerical results where compared to those found by Cox regression. The results of the two methods were at the same level. NTNU Slide no.: 17

Recommend


More recommend