New approaches for improving New approaches for improving Data - PowerPoint PPT Presentation

Ph. D. THESIS New approaches for improving New approaches for improving Data mining feature selection Data mining feature selection techniques techniques Supervised by: Elaborated by: Pr. Y. Slimani M. A. Esseghir Pr. G. Goncalves 21/09/06 1 1 M. T. Hsu

Outline Outline � Introduction Introduction � � Feature selection problem Feature selection problem � � Existing approaches Existing approaches � � The proposed approaches The proposed approaches � � Search topics and perspectives Search topics and perspectives � � Conclusion Conclusion � 2 2

Introduction Introduction � The ability of machines in storing increasing data The ability of machines in storing increasing data � volume outpass outpass their ability to analyze them. their ability to analyze them. volume � Applied Data mining techniques : Applied Data mining techniques : � � Computational cost Computational cost number of features number of features � � Classification accuracy Classification accuracy high dimensionality high dimensionality � � Identification of Identification of representative features representative features to build to build � classification models. models. classification 3 3

Feature Selection (FS) (FS) problem problem Feature Selection Definition Feature selection studies how to select a subset or list of attributes or variables that are used to construct models describing data .”Huan Liu” IEEE senior member Definition 2 A process that chooses an optimal subset of features according to a certain criterion Objectives � Identification of Identification of salient salient features features � � Discarding: Discarding: irrelevant irrelevant , , redundant redundant , noisy data. , noisy data. � � Enhance the models comprehensibility. Enhance the models comprehensibility. � � Avoid models Avoid models overfitting overfitting. . � � Improve classification and time response (time and Improve classification and time response (time and � complexity) capabilities. complexity) capabilities. 4 4

Existing Approaches Existing Approaches � Wrappers Wrappers and Filters and Filters � � Filters Filters: selects subsets using their general : selects subsets using their general � characteristics (intrinsic properties). characteristics (intrinsic properties). Search: Forward and backward search based one criterion. Search: Forward and backward search based one criterion. � � � Dependency measures Dependency measures � � Information measures Information measures � � Consistency measures Consistency measures � � Wrappers Wrappers: apply a learning algorithm to evaluate : apply a learning algorithm to evaluate � selected subsets. selected subsets. � Search: Exhaustive, random, heuristic ( Search: Exhaustive, random, heuristic ( GA,SA,HC,GrS GA,SA,HC,GrS ). ). � � Evaluation: Evaluation: ANN ANN , , ID3,C4.5, NB,SVM ID3,C4.5, NB,SVM . . � 5 5

6 6 Huan Liu FS process FS process Exploration Validation

Existing Approaches (2) Existing Approaches (2) Advantages drawbacks Advantages drawbacks Simple to implement � Simple to implement Not well performing Filters � � Not well performing Filters � Low search cost O(N 2 ) � Low search cost O(N 2 � ) Independent criterion � Independent criterion � 1 feature at a time � 1 feature at a time � � High subsets qualities High subsets qualities � Exponential exploration Exponential exploration Wrappers � � Wrappers search (2 search (2 n n ) ) Improves classification � Improves classification � � high evaluation cost high evaluation cost � All features are � All features are � considered unadapted for large data for large data considered � unadapted � sets sets 7 7

The proposed approaches The proposed approaches � Genetic Algorithm (AG) Genetic Algorithm (AG) � � Standard Standard � � Mimetic algorithms: hybrid global+ local Mimetic algorithms: hybrid global+ local � search search � Parallel Parallel FS FS for high dimensional data for high dimensional data � � ISLAND model ISLAND model � � Multi Multi- -agent System agent System � 8 8

The proposed approaches(2) The proposed approaches(2) � Ant Colony Optimizer (ACO): Ant Colony Optimizer (ACO): � � AS and ACS adaptation: AS and ACS adaptation: � � 2Graph complete 2Graph complete � � Nodes corresponds to attributes Nodes corresponds to attributes � � Polarized edges Polarized edges � � Hybrid search: Hybrid search: � � Combining wrappers and filters Combining wrappers and filters � � Correlation guided search Correlation guided search � � Discarding redundant features. Discarding redundant features. � 9 9

Search Topics and perspectives Search Topics and perspectives � New feature section search strategies, based on New feature section search strategies, based on � metaheuristic adaptations, as: adaptations, as: metaheuristic � Multi agent genetic algorithms Multi agent genetic algorithms � � Ant colony optimization (ACO) Ant colony optimization (ACO) � � Particle swarm optimizer (PSO) Particle swarm optimizer (PSO) � � Cultural algorithms. Cultural algorithms. � � Improving evaluation quality: multi Improving evaluation quality: multi- -objective objective � optimization. optimization. � Parallelization, distribution, load balancing, Parallelization, distribution, load balancing, � integration into a common framework (DM grid integration into a common framework (DM grid service) service) 10 10

Search Topics(2) Search Topics(2) � Hybridization of wrapper and filter Hybridization of wrapper and filter � approaches. approaches. � New feature selection approaches for New feature selection approaches for � unsupervised classification . . unsupervised classification 11 11

Conclusion Conclusion � Fs is a multi Fs is a multi- -disciplinary search topic: disciplinary search topic: � � Statistics;Optimization;Data Statistics;Optimization;Data mining mining � � FS is an Essential KDD step to face new FS is an Essential KDD step to face new � data mining challenges. data mining challenges. � High dimensionality, Biological data, Streaming Data mining, High dimensionality, Biological data, Streaming Data mining, � � FS poses new challenges to data mining FS poses new challenges to data mining � community. community. � New efficient search strategies, hybrid strategies. New efficient search strategies, hybrid strategies. � 12 12

New approaches for improving New approaches for improving Data - PowerPoint PPT Presentation

Ph. D. THESIS New approaches for improving New approaches for improving Data mining feature selection Data mining feature selection techniques techniques Supervised by: Elaborated by: Pr. Y. Slimani M. A. Esseghir Pr. G. Goncalves

Improving Improving Finances, Finances, Improving Improving Lives Lives www.jeanchatzky.com

New Approaches to New Approaches to New Approaches to Repair of Repair of Repair of Spinal

for innovation improving for innovation improving Design Thinking for innovation improving New

Pennine Acute Hospitals NHS Trust: Improvement Journey 1 Pennine Improvement Plan Improving

God of Peace? Question Question Various approaches Question Various approaches Suggestions

Improving Outcomes and Controlling Costs: Improving Outcomes and Controlling Costs: Improving

Services in Portsmouth date Improving health services Improving health services Improving

Bending the Cost Curve and Improving Bending the Cost Curve and Improving Bending the Cost Curve

Duke iGEM 2014 Methodology Scaling up Synthetic Biology Improving Improving Improving CRISPR

Learning Approaches to Estimate Depth from RGB Lecture 5 What will we learn - Latest Approaches

Machine Translation Week 1: Classical approaches Classical and Statistical Approaches

Outline Specification Approaches Munindar P. Singh (NCSU) Service-Oriented Computing Spring

New MTT - Approaches 20 Marine Raker Pile Installation New MTT - Approaches Excavation

Keeping Score: Keeping Score: New Approaches to the Standard of Living New Approaches to the

Vladimir Polyakov Vladimir Polyakov NEW APPROACHES APPROACHES TO TO NEW LANGUAGE SIMILARITY

TITANIUM EYEWEAR DESIGNED IN ICELAND, MADE IN ITALY AGNAR NEW NEW NEW ALBA NEW NEW NEW

Ants, Mutants and Beyond Combining formal and stochastic techniques to improve software

Spatial Sorting Algorithms for Parallel Computing in Networks Max OrHai, Christof Teuscher 2011

Bioinformatics: Network Analysis Model Fitting COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay

Why is theory important? Heuristic Optimization We want to understand how an algorithm behaves

Machine Learning Lecture 1 Justin Pearson 1 2020 1

ParaFormance: An Advanced Refactoring Tool for Parallelising C++

A Multi-start Heuristic Algorithm for the Generalized Traveling Salesman Problem V. Cacchiani, A.

CSCI-UA.9480 Introduction to Computer Security Session 1.3 Public Key Cryptography and

Sambuz

Useful Links

Newsletter

Mail Us