patients and doctors together to discover medical
play

Patients and Doctors Together to Discover Medical Knowledge with - PowerPoint PPT Presentation

Patients and Doctors Together to Discover Medical Knowledge with Statistics and Classification: of Patients, by Patients and for Patients Ye-In Chang ( ) Department of Computer Science and Engineering National Sun Yat-sen University


  1. Patients and Doctors Together to Discover Medical Knowledge with Statistics and Classification: of Patients, by Patients and for Patients Ye-In Chang ( 張玉盈 ) Department of Computer Science and Engineering National Sun Yat-sen University

  2. Outline  Introduction  A Survey of Knowledge Discovery  Proposed Methods  Chronic Kidney Disease as an Important Risk Factor for Tumor Recurrences, Progression and Overall Survival in Primary Non- Muscle-Invasive Bladder Cancer  Applying the Chi-Square Test to Improve the Performance of the Decision Tree for Classification by Taking Baseball Database as an Examples  Conclusion 2

  3. Introduction  Knowledge discovery in the database focuses upon methodologies for extracting useful information from collection of data. 3

  4. Introduction  One of approaches for knowledge discovery is data mining.  Data classification is one of famous and useful techniques for data mining that assigns categories to collected data in order to analyze the accurate prediction.  Moreover, one of models for data classification is a decision tree.  In fact, one of key points of a good decision tree is the kind of deciding factors in the internal nodes. 4

  5. Introduction  In statistical tests, the chi-square test is one of good ways to analyze whether categorical variable A is the significant factor to categorical variable B .  From our observation from research papers in the topic of medicine, we consider that the risk factor ( i.e. , the significant factor of the chi-square in statistics) is strongly related to the important deciding factor in the decision tree. 5

  6. Introduction  In this research, first, we study the chronic kidney disease as an important risk factor for the bladder cancer by cooperating with Department of Urology, Chang Gung Memorial Hospital, Kaohsiung, Taiwan, and we propose a statistic approach to check the relation. 6

  7. Introduction  Second, we make use of the significant factor to improve the performance of the decision tree, and we propose an approach which aims to reduce the number of deciding factors and decide the order of deciding factors in a decision tree. 7

  8. Survey  Statistical tests and data mining techniques have been largely studied, developed and applied for many fields. 8

  9. Statistics  In statistics, there are two types of attributes including continuous numbers and categorical variables.  Examples of continuous number are age and weight.  I am 30 years old and my weight is 60.5 kg.  Moreover, examples of categorical variables are gender and location.  I am a girl and I live in Kaohsiung. 9

  10. Statistics  The chi-square test is one of the statistical tests, and is designed to analyze whether a significant relationship exists between two categorical variables.  Furthermore, it is used in many fields including medical studies, finance and market, education and sports. 10

  11. Chi-Square Test  There are four steps in the chi-square test which are described as follows.  Step 1. State the null hypothesis (H 0 ) and alternative hypothesis (H a )  Step 2. Determine the significant level  Step 3. Analyze the database  Step 4. Explain the results in terms of the hypothesis 11

  12. Data Mining  The preprocess of the data mining technique includes  data selection  data cleaning  data transformation 12

  13. Data Mining  Classification is one of the most common learning models in data mining.  One of approaches for classification is a decision tree, which makes the analysis of very large datasets effective. 13

  14. Decision Tree Player outlook humidity temperature action 001 sunny low hot stay at home 002 overcast medium low go to play 003 rainy medium hot stay at home 004 sunny low low go to play 005 overcast low hot go to play 006 rainy high low stay at home 007 overcast medium low go to play 008 rainy high low stay at home 009 rainy high hot stay at home 010 sunny medium low go to play 14

  15. 15 Bladder Cancer

  16. Bladder Cancer  We consider ten putative risk factors.  We follow up five observations.  Bladder tumor recurrence  Upper urinary tract (UUT) tumor recurrence  Cancer progression  Cancer-specific survival  Overall survival 16

  17. Chronic Kidney Disease as an Important Risk Factor for Tumor Recurrences, Progression and Overall Survival in Primary Non-Muscle-Invasive Bladder Cancer International Urology and Nephrology , Vol. 48, No. 6, pp. 993-999, June 2016. (SCI) 17

  18. Method  This retrospective study was approved by the hospital review boards of Kaohsiung Chang Gung Memorial Hospital, and has been performed in accordance with the ethical standards in the Declaration of Helsinki. 18

  19. Method  In the step of data selection, we choose the medical record from the Cancer Center, Chang Gung Memorial Hospital, Kaohsiung, Taiwan.  The content of the medical records contain 2140 bladder cancer patients and 119 medical record fields for each patient.  They were reviewed for the 10 putative variables, including patient age, gender, white blood cell leucocyte (WBC), NL ratio, tumor count, size, grade, stage, eGFR and squamous differentiation of histology. 19

  20. Method  A total of 158 patients with primary diagnosis of TaT1-NMIBC ( i.e. attribute stage = ’Ta’ or stage = ’T1’) from January 2008 to December 2010 were treated by transurethral resection of bladder tumors (TURBT) at the urologic department.  All of the patients were followed up for more than four years until December 2014. 20

  21. Method  In the step of cleaning unclear data, we must fill in the related data for the unclear data by checking the related data from the other surgery database.  For example, a patient’s tumor count record is empty in the original database, we have to check his(her) related data in the surgery database to know the clear record. 21

  22. Method  In the step of data enrichment, we convert the medical result from the continuous number into the categorical variable.  For example, we have to convert the putative factor, eGFR( ≥ 60, 30-59 and <30), into three categorical values, respectively. 22

  23. Method  In the step of statistic test, we analyze the P -value of putative risk factors by using chi-square test.  In fact, we use the Number Cruncher Statistical System (NCSS) to do such a chi-square test. 23

  24. Bladder Cancer  We consider ten putative risk factors.  Ages ( 年齡 )  Tumor size ( 腫瘤尺寸 )  Gender ( 性別 )  Grade ( 程度 )  Stage ( 期別 )  White blood cell leucocyte (WBC) ( 白血球數 )  Estimated glomerular filtration rate (eGFR)  Neutrophil to lymphocyte ( 腎絲球過濾率 ) ratio (NL ratio)  Squamous differentiation ( 嗜中性白血球與淋巴球比例 ) ( 鱗狀分化 )  Tumor Count ( 腫瘤數目 ) 24

  25. Result Characteristic No(%) Characteristic No(%) Age Tumor size(cm) < 40 3(2) < 3 153(97) ≥ 3 40-69 85(54) 5(3) ≥ 70 70(44) Grade Gender Low 49(31) Male 112(71) High 109(69) Female 46(29) Stage WBC(k/ul) Ta 104(66) < 10 143(91) T1 54(34) ≥ 10 15(9) eGFR(ml/min) ≥ 60 NL ratio 86(54) < 4 79(77) 30-59, CKD stage 3 35(22) ≥ 4 23(23) < 30, CKD stage 4, 5 37(24) Tumor count Squamous differentiation 1 90(57) No 148(94) 2-7 63(40) Yes 10(6) ≥ 8 5(3) 25

  26. Result  Tumor recurrences, progression and overall survival of NMIBC patients with 4-year follow-up With CKD Without CKD Total %(no.) %(no.) %(no.) Bladder tumor recurrence 40(29/72) 26(22/86) 32(51/158) UUT tumor recurrence 7(5/72) 0(0/86) 3(5/158) Progression 7(5/72) 5(4/86) 6(9/158) Overall survival 63(45/72) 91(78/86) 78(123/158) 26

  27. Result Bladder tumor Overall UUT tumor Cancer-specific Progression recurrence survival recurrence survival ✔ ✔ ✔ ✔ 1 eGFR(CKD) ✔ ✔ ✔ ✔ 2 Grade ✔ ✔ ✔ ✔ 3 Stage ✔ ✔ 4 Tumor Count ✔ 5 Squamous differentiation ✔ ✔ 6 Tumor Size 7 Age ✔ 8 Gender 9 WBC 10 NL ratio 27

  28. Summary  Chronic kidney disease (CKD) is an important risk factor for tumor recurrences and progression.  We have studied that NMIBC patients with CKD should be intensively monitored at the UUT and bladder. 28

  29. Summary  We thank the Department of Urology, Chang Gung Memorial Hospital, Kaohsiung, Taiwan for providing data for this study.  The content of the medical records contain 2140 patients and 119 medical record fields for each patient. 29

  30. Applying the Chi-Square Test to Improve the Performance of the Decision Tree for Classification by Taking Baseball Database as an Example Journal of Computers , Dec. 2018. 30

  31. Baseball Database  In the real world, a large amount of baseball batting data has been collected as the digital database.  There are 659 baseball players who have at bats identified through Chinese Professional Baseball League (CPBL) team website from 2009 to 2015.  Each team plays 120 games in a year since 2009. 31

Recommend


More recommend