Application of Multi- -Objective Objective Metaheuristic Metaheuristic Application of Multi Algorithms in Data Mining Algorithms in Data Mining Presented by: Presented by: Dr Beatriz de la Iglesia Iglesia Dr Beatriz de la University of East Anglia University of East Anglia Norwich, Norfolk, UK Norwich, Norfolk, UK email: bli@cmp.uea.ac.uk bli@cmp.uea.ac.uk email: UKKDD’ ’07 07 UKKDD
Overview Overview l Why use Multi Why use Multi- -Objective (MO) algorithms? Objective (MO) algorithms? l l An introduction to MO optimisation An introduction to MO optimisation l l MO algorithms for classification MO algorithms for classification l l Conclusions Conclusions l Dr. B de la Iglesia Dr. B de la Iglesia UKKDD’ UKKDD ’07 07 2 2
Data Mining Data Mining Data Mining is a step a step in the KDD process. in the KDD process. Data Mining is It consists of the application of particular data It consists of the application of particular data mining algorithms algorithms to extract higher level to extract higher level mining information in the form of a model model or a set of or a set of information in the form of a patterns from a large dataset. from a large dataset. patterns Dr. B de la Iglesia Dr. B de la Iglesia UKKDD’ UKKDD ’07 07 3 3
Model selection Model selection l Many models can fit the same data. Many models can fit the same data. l l Data mining is concerned with the improvement Data mining is concerned with the improvement l (optimisation optimisation) of the model to obtain the best ) of the model to obtain the best ( prediction or description of the data, depending prediction or description of the data, depending on the objectives of the KDD process. on the objectives of the KDD process. Dr. B de la Iglesia Dr. B de la Iglesia UKKDD’ UKKDD ’07 07 4 4
Marketing example The goal is to predict whether a customer will buy a product given their sex, country and age (classification). Sex Country Age Buy? Goal/class M France 25 Yes Freitas and M England 21 Yes Lavington (1998) F France 23 Yes Data Mining with F England 34 Yes EAs, CEC99. F France 30 No M Germany 21 No M Germany 20 No F Germany 18 No F France 34 No M France 55 No Dr. B de la Iglesia Dr. B de la Iglesia UKKDD’ UKKDD ’07 07 5 5
Different models country? sex Germany England Buy?-yes France no yes country age? Buy?- no <= 25 > 25 age yes no Internal Leaf branching node node input hidden output layer layer layer Decision Tree Decision Tree Neural Network Neural Network Dr. B de la Iglesia Dr. B de la Iglesia UKKDD’ UKKDD ’07 07 6 6
Optimising the model/patterns Optimising the model/patterns l Data Mining is an Data Mining is an optimisation optimisation process. process. l l We search for the best model or patterns We search for the best model or patterns l according to some evaluation evaluation criteria. criteria. according to some l This normally requires adjusting This normally requires adjusting parameters parameters of of l the algorithm. the algorithm. Dr. B de la Iglesia Dr. B de la Iglesia UKKDD’ UKKDD ’07 07 7 7
Generalisation Generalisation l Model should not only model the data used to build them Model should not only model the data used to build them l ( train set train set ) but also the real ) but also the real- -world process that is world process that is ( generating the data. generating the data. l Only then we may get a model that will Only then we may get a model that will generalise generalise to to l other samples from the real- -world process. world process. other samples from the real l We use an independent sample ( We use an independent sample ( test set test set ) drawn from the ) drawn from the l real- -world data to test the performance of the model on world data to test the performance of the model on real new data. new data. l Test set must not be compromised when building the Test set must not be compromised when building the l model. A validation set validation set should be used for any testing of should be used for any testing of model. A the model in the intermediary stages. the model in the intermediary stages. Dr. B de la Iglesia Dr. B de la Iglesia UKKDD’ UKKDD ’07 07 8 8
Model selection criteria Model selection criteria l Most selections involve more than one criterion Most selections involve more than one criterion l and there are often conflicts or trade- -offs offs and there are often conflicts or trade between different criteria. between different criteria. l Eg Eg. A couple are buying a house . A couple are buying a house l n She wants a very modern house with She wants a very modern house with “ “wow wow” ” factor factor n and gadgets and gadgets n He wants a house with many rooms for family growth He wants a house with many rooms for family growth n n The both want to find the house of their dreams as The both want to find the house of their dreams as n cheaply as possible cheaply as possible Dr. B de la Iglesia Dr. B de la Iglesia UKKDD’ UKKDD ’07 07 9 9
Multi- -objective problem objective problem Multi "wow" Cost Rooms factor £250,000 2 4 £300,000 4 5 £500,000 5 4 £500,000 4 3 £550,000 3 10 £900,000 4 3 £900,000 5 7 £1,000,000 10 5 Dr. B de la Dr. B de la Iglesia Iglesia UKKDD’ UKKDD ’07 07 10 10
Multi- -Objective Data Mining Objective Data Mining Multi l In data mining there are also many conflicting In data mining there are also many conflicting l criteria for model evaluation. criteria for model evaluation. l Eg Eg. . l n Decision trees and Neural Nets may be evaluated by Decision trees and Neural Nets may be evaluated by n their complexity and their generalisation error. their complexity and their generalisation error. n Association rules may be evaluated by their support Association rules may be evaluated by their support n and confidence. and confidence. n Clustering solutions may evaluate entropy and purity Clustering solutions may evaluate entropy and purity n or other measures of clustering quality. or other measures of clustering quality. Dr. B de la Iglesia Dr. B de la Iglesia UKKDD’ UKKDD ’07 07 11 11
Multi- -Objective Optimisation Objective Optimisation Multi l Given two solutions with different objective Given two solutions with different objective l values, it is not possible to state categorically values, it is not possible to state categorically than one solution is better than the other. than one solution is better than the other. l Multi Multi- -objective algorithms must find the set of all objective algorithms must find the set of all l such trade trade- -off off solutions. solutions. such l The user can then select a solution according to The user can then select a solution according to l preference. preference. Dr. B de la Iglesia Dr. B de la Iglesia UKKDD’ UKKDD ’07 07 12 12
MO Opmimisation Opmimisation MO l Given a problem with Given a problem with n n objectives objectives f f 1 , … …, f , f n , 1 , n , l each of which is going to be maximised , , each of which is going to be maximised solution a a dominates dominates solution solution b b if if solution { } � f ( a ) ≥ f ( b ) ∀ i ∈ 1 , , n and i i { } � ∃ j ∈ 1 , , n such that f ( a ) > f ( b ). j j l Given a set of solutions, Given a set of solutions, S, S, a solution a solution a a ∈ S is is ∈ S l non- -dominated dominated if there is no solution if there is no solution s s ∈ S that that non ∈ S dominates a. dominates a. Dr. B de la Iglesia Dr. B de la Iglesia UKKDD’ UKKDD ’07 07 13 13
Pareto Front Pareto Front Dr. B de la Dr. B de la Iglesia Iglesia UKKDD UKKDD’ ’07 07 14 14
MO algorithms MO algorithms l Should approximate the Pareto Should approximate the Pareto- -front. front. l l Should provide a good spread of solutions in the Pareto Should provide a good spread of solutions in the Pareto- - l front. front. l Evolutionary Algorithms ( Evolutionary Algorithms (EAs EAs) are well suited to MO ) are well suited to MO l optimisation as they deal with a population of solutions. optimisation as they deal with a population of solutions. l Many alternative EA approaches including Many alternative EA approaches including l Aggregating functions Aggregating functions n n Lexicographical ranking Lexicographical ranking n n Pareto Dominance (e.g. PAES, NSGA II, SPEA 2) Pareto Dominance (e.g. PAES, NSGA II, SPEA 2) n n Dr. B de la Iglesia Dr. B de la Iglesia UKKDD’ UKKDD ’07 07 15 15
Classification Classification l In classification a model is sought which can In classification a model is sought which can l assign a class to each instance in the database. assign a class to each instance in the database. It relies on historical labelled data. It relies on historical labelled data. l Nugget discovery or partial classification seeks Nugget discovery or partial classification seeks l to find patterns that represent a “ “strong strong” ” to find patterns that represent a description of a predefined class. description of a predefined class. l Particularly relevant for Particularly relevant for “ “minority classes minority classes” ”. . l Dr. B de la Iglesia Dr. B de la Iglesia UKKDD’ UKKDD ’07 07 16 16
Recommend
More recommend