using grid to facilitate using grid to facilitate
play

Using Grid to Facilitate Using Grid to Facilitate Diseasome - PowerPoint PPT Presentation

Using Grid to Facilitate Using Grid to Facilitate Diseasome Analysis from Taiwan Diseasome Analysis from Taiwan National Health Insurance National Health Insurance Research Database Research Database Yu- -Chuan (Jack) Li and Ming Chuan


  1. Using Grid to Facilitate Using Grid to Facilitate Diseasome Analysis from Taiwan Diseasome Analysis from Taiwan National Health Insurance National Health Insurance Research Database Research Database Yu- -Chuan (Jack) Li and Ming Chuan (Jack) Li and Ming- -Chin Lin, Graduate Chin Lin, Graduate Yu Institute of Biomedical Informatics, Institute of Biomedical Informatics, Taipei Medical University, Taiwan Taipei Medical University, Taiwan

  2. Outline Outline Introduction of NHIRD Introduction of NHIRD Frequency Distribution of Diseasome Frequency Distribution of Diseasome Comorbidity Analysis Comorbidity Analysis Conclusion Conclusion

  3. The National Health Insurance Research The National Health Insurance Research Database (NHIRD) Database (NHIRD) 10 years of data 10 years of data Coverage: about 99% residents in Taiwan Coverage: about 99% residents in Taiwan (23 million people from 530 hospitals and (23 million people from 530 hospitals and 17,000 clinics) 17,000 clinics) 360 million outpatient visits / year 360 million outpatient visits / year 25 million inpatient- -day / year day / year 25 million inpatient

  4. NHIRD NHIRD The NHIRD is opened for research by The NHIRD is opened for research by application application The NHIRD consists of claim records with NHIRD consists of claim records with The numbers and text numbers and text Demographics, Diagnoses (ICD 9 Demographics, Diagnoses (ICD 9- -CM 2001 CM 2001 version) , Medications, Procedures, Exams , Medications, Procedures, Exams version) and Costs data and Costs data Raw data size : 200GB / year Raw data size : 200GB / year

  5. Frequency of Visits Frequency of Visits Analyze database by patient visits Analyze database by patient visits � Frequency data over time (X Frequency data over time (X- -axis) and Age axis) and Age � (Y- -axis) axis) (Y � Heatmap visualization Heatmap visualization � Dermatophytosis of foot

  6. Frequency of Visits (cont.) Frequency of Visits (cont.) Analyze database by patient visits Analyze database by patient visits � Bottleneck Bottleneck -- --> Disk I/O Speed > Disk I/O Speed � � Using 12 Apple Mac mini with external Using 12 Apple Mac mini with external � Firewire Hard Drive (400 Mbps) Firewire Hard Drive (400 Mbps) � Collective bandwidth on I/O:4.8 Collective bandwidth on I/O:4.8 Gbps Gbps �

  7. Frequency of Visits (cont.) Frequency of Visits (cont.) WWW Grid (Globus) Result DB Send grid commend Jan Feb Mar Apr May June Jul Aug Sep Oct Nov Dec

  8. Frequency of Visits (cont.) Frequency of Visits (cont.) Big Vs. mini Big Vs. mini Pros Cons Pros Cons Big Strong CPU Expensive Big Strong CPU Expensive Strong I/O Hard to upgrade Strong I/O Hard to upgrade speed speed mini Cheap Mild CPU mini Cheap Mild CPU Low maintain Low I/O speed Low maintain Low I/O speed fee fee

  9. Frequency of Visits (cont.) Frequency of Visits (cont.) Difficulty on doing job on single machine Difficulty on doing job on single machine � Limitation of database size Limitation of database size � Take very long time to generate index table Take very long time to generate index table � Limitation of scaling up Limitation of scaling up � Hard to improve the performance Hard to improve the performance Performance vs vs Price curve Price curve -- --> not linear > not linear Performance

  10. Disease Frequency HeatMap HeatMap (NHIRD 2000) (NHIRD 2000) Disease Frequency

  11. Taiwan NHIRD 2000- -2002 2002 Taiwan NHIRD 2000 Influenza Erythema multiforme Lung Cancer

  12. 3-year seasonal change of “Cough” male Hepatitis B with coma female

  13. Influenza

  14. Hand foot and mouth disease Hand foot and mouth disease

  15. GIS distribution of “ “Cough Cough” ” GIS distribution of

  16. Cough Cough ??? QuickTime? 和 唯 TIFF (LZW) 乾 ? 縛︳ ? ? 螃粟 ?? 畫蚓

  17. Cough Cough

  18. Retrospective study - - Retrospective study Comorbility analysis analysis Comorbility The limitation The limitation � Grouping all visit records by unique ID Grouping all visit records by unique ID � � Software memory limitation Software memory limitation - - 2GB memory 2GB memory � Essential Essential Jan Jan Feb Feb Total transaction Total transaction HYPERTENSION record number HYPERTENSION record number (2000- -2002) 2002) (2000 2000 571,099 525, ,646 646 2000 571,099 525 2001 644,650 645,846 2001 644,650 645,846 2002 752,353 655,867 25,015,172 2002 752,353 655,867 25,015,172

  19. Disease Comorbidity analysis Disease Comorbidity analysis For Comorbidity analysis For Comorbidity analysis � ID1{dis1,dis2,dis3,dis4 ID1{dis1,dis2,dis3,dis4… ….} .} � For example For example � 192305,M,HS10710973,01340,2001 192305,M,HS10710973,01340,2001- -04 04- - � 11,4919|4659|4019|3534|4011|38022|4640|38 4919|4659|4019|3534|4011|38022|4640|38 11, 04|4785|3004|7291|78059|01340|460|4660| | 04|4785|3004|7291|78059|01340|460|4660 � 192505,F,KT71864585,01340,2002 192505,F,KT71864585,01340,2002- -07 07- - � 10,01100|01340|29532|0113|0119 01100|01340|29532|0113|0119| | 10,

  20. Bottleneck- - Grouping by ID Grouping by ID Bottleneck WWW Grouping 25015172 records Grid (Globus) Bottleneck Result DB Send grid commend Jan Feb Mar Apr May June Jul Aug Sep Oct Nov Dec

  21. Solution- - Solution Sorting and segmenting database for grid Sorting and segmenting database for grid architecture architecture WWW No grouping needed Grid (Globus) Result DB Send grid commend Grouping Grouping Grouping Grouping Grouping Grouping 1900 1901 1902 1903 1904 …. 1995 199619971998 1999 2000

  22. Our experience Our experience Divide NHIDB by month and year of Divide NHIDB by month and year of Birthdates Birthdates Divide NHIDB into 1,212 small databases Divide NHIDB into 1,212 small databases � 12 months * 101 years (from 1900 to 12 months * 101 years (from 1900 to � 2000)=1,212 segments 2000)=1,212 segments Easily scale up - - Linear acceleration Linear acceleration Easily scale up Low machine specification requirement Low machine specification requirement

  23. Comorbidity Comorbidity About 10 diagnoses per person in 3 years About 10 diagnoses per person in 3 years Clusters of comorbidity are being identified Clusters of comorbidity are being identified and pre- -calculated calculated and pre 1TB of comorbidity data processed for 7 1TB of comorbidity data processed for 7 days under a 100- -PC grid PC grid days under a 100

  24. Endometriosis and Neoplasm of uncertain behavior of ovary Endometriosis and Neoplasm of uncertain behavior of ovary Old Young

  25. Endometriosis

  26. Conclusion Conclusion Linear improvement of performance is Linear improvement of performance is achievable if the data are properly achievable if the data are properly segmented segmented A heatmap heatmap for visualization of frequency for visualization of frequency A distribution over season and patient age is distribution over season and patient age is useful for huge data sets useful for huge data sets A geographical relationship of frequency A geographical relationship of frequency distribution can also be visualized distribution can also be visualized

  27. Conclusion (cont.) Conclusion (cont.) Comorbidity is one area that has great Comorbidity is one area that has great potential but very computation- -intensive intensive potential but very computation Complete comorbidity data can be crossed Complete comorbidity data can be crossed with genome, haplome haplome and and bibliome bibliome data data with genome, to achieve greater utility to achieve greater utility

  28. Thank you Thank you

Recommend


More recommend