11/02/2016 ÇUKUROVA UNIVERSITY Outline DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING Meta Learning on Small • Biomedical Informatics Background • Big Data/ Small Data Biomedical Datasets • Machine Learning Algorithms • How to classify small medical data Problem with machine learning Turgay Ib Ibrikci, , (Presenter) • Datasets & Feature Esr sra Mahse sereci Karabulut, , Material & Methods • Meta Learning Algorithms Jean Dieu Uwise sengeyi yima mana • WEKA from Cukurova University, TURKEY • The ROC area Results Results & Discussions • The F-measurement Results • Methods Conclusions • Datasets The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam Background -> Biomedical informatics Background -> Small Data / Big Data Small Data Big Data Biomedical informatics is the field of science in which all kind of medical data, computer science, and information technology merge to form a single discipline. Small data is data in an accessible, Big data can be described informative, by actionable. high volume, Biology Machine Learning Mathematics high velocity, Small data typically high variety, answers a specific high veracity, Algorithms Genetics Biomedical informatics Proteomics question high variability, or addresses a specific on information assets. Computer science Medical cares Medicine problem. Pharmacogenomics Data Science Statistics Clinical data Informatics The 7th International Conference on Information Science and Application (ICISA2016), The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam Feb 15-18, 2016, Ho Chi Minh City, Vietnam Background -> Machine Learning Background -> Machine Learning • Types of Machine Learning: • Machine Learning : There are many different machine learning algorithms that gives computers the ability to learn without being explicitly programmed. • Machine learning is a subfield of computer science that is a growing role in a wide They are mostly range of critical applications such as • Supervised Learning data mining, The system learns by examples with its input and desired outputs on predefined set pattern recognition, of data examples, so the goal is to learn a general rule that maps inputs to outputs. expert systems, It provides powerful tools for prediction and classification. a vastly improved understanding of the human genome. • Unsupervised Learning • Machine learning is so pervasive today that you probably use it dozens of times a No labels are given to the learning algorithm, leaving it on its own to find structure day without knowing it. in its input. Clustering, Anomaly detection and dimension reduction are key techniques for unsupervised learning. The 7th International Conference on Information Science and Application (ICISA2016), The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam Feb 15-18, 2016, Ho Chi Minh City, Vietnam 1
11/02/2016 Problem -> How to classify small medical data with machine learning Material • Datasets* • Arrhythmia • Heart disease(Cleveland) • Vertebral column (2C) http://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html • CTG • Medical data is collection of hospital/clinical records for expert person who could be medical • Diabetes (Pima Indians) doctor or/and technical person/machine/algorithms to help for making decision. • Mammographic mass • Meta learning is learning algorithms, set by Donald B. Maudsley, that are applied on data to • Parkinson understand the interaction between the mechanism of learning and the concrete contexts. • Meta learning provides one such methodology that allows systems to become more effective • Wisconsin breast cancer through experience. • WEKA • Meta learning differs from base learning in the scope of the level of the adaptation. The 7th International Conference on Information Science and Application (ICISA2016), The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam Feb 15-18, 2016, Ho Chi Minh City, Vietnam Material -> Datasets Material -> Weka • W aikato E nvironment for K nowledge A nalysis (WEKA) Datasets* Instances Attributes Classes • It’s a data mining/machine learning tool developed Arrhythmia [6] 452 279 2 by Department of Computer Science, University of Heart disease(Cleveland) [7] 303 13 5 Waikato, New Zealand. Vertebral column (2C) [8] 310 6 2 • 100+ algorithm for classification CTG [9] 2126 21 3 • 75 for data preprocessing Diabetes (Pima Indians) [10] 768 8 2 • 25 to assist with feature selection Mammographic mass [11] 961 5 2 The Explorer: Parkinson [12] 194 22 2 • 20 for clustering, finding association rules, etc. Preprocess data Classification Wisconsin breast cancer [13] 699 9 2 Clustering *These all datasets are taken from UCI Machine Learning Repository Association Rules Attribute Selection Data Visualization The 7th International Conference on Information Science and Application (ICISA2016), The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam Feb 15-18, 2016, Ho Chi Minh City, Vietnam Methods Methods • Bagging : is a bootstrap method for improving the accuracy of the model by using the multiple random redistribution copies of the training set [14]. • Main point on bagging algorithm, average of misclassification errors on divided different subset of the data gives a better • B agging estimate of the predictive ability of a learning method. Thus, bagging pursues to reduce the error rate by using a variance of the base classifier. • D agging • Dagging : is similar to Bagging, but as input to each member of the ensemble it uses disjoint stratified folds of the training data instead of bootstrap samples [15]. • D ecorate • R andom Forest • Decorate : D iverse E nsemble C reation by O ppositional R elabeling of A rtificial T raining E xamples directly builds ensembles of diverse classifiers by using specially constructed artificial training examples. • F iltered Classification • It is a simple and general meta-learner that can decide to use any strong learner as a base classifier to build diverse groups [16]. • Rotation Forest : is also one method for generating classifier ensembles based on feature extraction. • The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees are most often chosen because they are sensitive to rotation of the feature axes, hence the name "forest.” Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier [17]. • Filterered Classification : This filter is generated using the training data, and then applied to the test data. The filter will be processed on the test data without any changing the structure of it [18]. The 7th International Conference on Information Science and Application (ICISA2016), The 7th International Conference on Information Science and Application (ICISA2016), Feb 15-18, 2016, Ho Chi Minh City, Vietnam Feb 15-18, 2016, Ho Chi Minh City, Vietnam 2
Recommend
More recommend