Redundant Feature Elimination Redundant Feature Elimination for - PowerPoint PPT Presentation

1 Redundant Feature Elimination Redundant Feature Elimination for Multi-Class Problems for Multi-Class Problems Annalisa Appice, Michelangelo Ceci Dipartimento di Informatica, Università degli Studi di Bari, Italy Simon Rawles, Peter Flach Department of Computer Science, University of Bristol, UK

2 Re dundant dundant fe fe ature ature r r eduction eduction Re • REFER : an efficient, scalable, logic- based method for eliminating Boolean features which are redundant for multiclass classifier learning. – Why? Size of hypothesis space, predictive performance, model comprehensibility. – Distinct from feature selection.

3 Overview of this talk Overview of this talk • Redundant feature reduction – What is feature redundancy? – Doing multi-class reduction • Related approaches • Theoretical and experimental results • Summary • Current and future work

4 Example: Redundancy of features Example: Redundancy of features f 1 f 2 f 3 class e 1 1 1 0 a e 2 0 1 0 a e 3 0 0 0 a e 4 0 0 0 b e 5 1 0 0 b A fixed number of Boolean features One of several class labels (‘multiclass’)

5 Discriminating a against b Discriminating a against b f 1 f 2 f 3 class e 1 1 1 0 a e 2 0 1 0 a e 3 0 0 0 a e 4 0 0 0 b e 5 1 0 0 b True values in examples of class a make the feature better for distinguishing a from b in a classification rule.

6 Discriminating a against b Discriminating a against b f 1 f 2 f 3 class e 1 1 1 0 a e 2 0 1 0 a e 3 0 0 0 a e 4 0 0 0 b e 5 1 0 0 b False values in examples of class b make the feature better for distinguishing a from b in a rule.

7 Discriminating a against b Discriminating a against b f 1 f 2 f 3 class e 1 1 1 0 a e 2 0 1 0 a e 3 0 0 0 a e 4 0 0 0 b e 5 1 0 0 b f 2 covers f 1 and f 3 is useless . f 1 and f 3 are redundant . Negated features are not automatically considered.

8 More formally... More formally... For discriminating class a examples from class b , • f covers g if T a (g) ⊆ T a (f) and F b (g) ⊆ F b (f). • A feature is redundant if another feature covers it. f 1 f 2 class T a (f 2 ) = {e 1 , e 2 }. 1 1 a e 1 T a (f 1 ) = {e 1 }. e 2 0 1 a F b (f 2 ) = {e 4 , e 5 }. e 3 0 0 a F b (f 1 ) = {e 5 }. e 4 0 0 b a is the ‘positive e 5 1 0 b class’ here

9 Neighbourhoods of examples Neighbourhoods of examples • A way to upgrade to multi-class data. • Each class is partitioned into subsets of similar examples. – REFER-N finds non-redundant features between each neighbourhood pair in turn. – Builds up list of non-redundant features between each neighbourhood pair in turn. • Efficient, more reduction, logic-based.

10 Neighbourhood construction Neighbourhood construction

11 Neighbourhood construction Neighbourhood construction 1

14 Neighbourhood construction Neighbourhood construction 1 1 1 1

18 Neighbourhood construction Neighbourhood construction 2 2 2

20 Neighbourhood construction Neighbourhood construction 3 3 3 3 3

24 Neighbourhood construction Neighbourhood construction 5 5 5

25 Neighbourhood construction Neighbourhood construction Groups of similar examples 1 with the same class label 1 1 1 2 5 2 2 3 5 5 3 3 3 3 4

26 Neighbourhood comparison Neighbourhood comparison 1 2 5 3 4

27 Neighbourhood comparison Neighbourhood comparison 1 2 5 3 4

28 Neighbourhood comparison Neighbourhood comparison 1 2 5 3 Comparing all 4 neighbourhoods of differing class

29 Ancestry of REFER Ancestry of REFER • REDUCE (Lavra č et al. 1999) – Feature reduction for propositionalised ILP datasets – Preserves learnability of a complete and consistent hypothesis • REFER uses a variant of REDUCE – Redundant features found between the examples in each neighbourhood pair – Prefers features already found non-redundant

30 Related multiclass filters Related multiclass filters • FOCUS for noise-free Boolean data (Almuallim & Dietterich 1991) – Exhaustive evaluation of all subsets – A time complexity of O( n p ) • SCRAP relevance filter (Raman 2003) – Also uses neighbourhood approach – No guarantee that selected features (still) discriminate among all classes.

31 Theoretical results Theoretical results • REFER preserves the learnability of a complete and consistent theory. – If a C&C rule was in the original data, it’ll be in the reduced data. • REFER is efficient. Time complexity is – … linear in number of examples – … quadratic in number of features

32 Experimental results Experimental results • Mutagenesis data from SINUS – Feature set greatly reduced (13118 → 44) – Accuracy still competitive (approx. 85%) # of reduced features 120 100 80 REFER 60 REDUCE 40 20 0 0 5000 10000 15000 # of original features

33 Experimental results Experimental results • Thirteen UCI benchmark datasets – Compared with LVF, CFS and Relief using discrete/discretised data – Generally conservative – Faster: 8 out of 13 faster, 3 very close. – Competitive predictive accuracy using several classifiers: JRIP NB C4.5 SVM Winner 6 7 3 6 Within 3 0 2 4 1%

34 Experimental results Experimental results • Reuters-21578 large-scale high- dimensionality sparse data – 16,582 preprocessed features were reduced to 1450. – REFER supports parallel execution well. • REFER runs in parallel on subsets of the feature set and again on the combination.

35 Summary Summary • A method for eliminating redundant Boolean features for multi-class classification tasks. • Uses logical coverage of examples • Efficient and scalable – requiring less time than the three feature selection algorithms we used • Amenable to parallel execution

36 Current and future investigations Current and future investigations • Interaction between feature selection and feature reduction – Benefits of combination • Noise handling using non-pure neighbourhoods (‘relaxed REFER’) – Overcoming sensitivity to noise • REFER for example reduction

37 Questions Questions

39 Average reduction on UCI data Average reduction on UCI data # of reduced features 160 140 120 LVF 100 CFS 80 RELIEFF 60 REFER 40 20 0 0 50 100 150 200 Number of original features

40 Effect of choice of starting point Effect of choice of starting point Number of reduced features 120 100 80 60 40 20 0 Aud Brid Car F1C F1M F3C F3M Mus Nur Post Tic Pim Yea Number of neighbourhoods constructed 1000 800 600 400 200 0 Aud Brid Car F1C F1M F3C F3M Mus Nur Post Tic Pim Yea

41 Comparison of running times Comparison of running times Time (s) # in- # fea- Dataset stances tures LVF CFS R ELIEF F R EFER Audiology 398 184 3.37 0.80 3.84 0.72 Bridge 108 83 0.89 0.38 0.67 0.22 Car 1728 21 1.94 0.44 15.92 0.50 Flare1066/C 1066 40 2.62 0.48 11.51 0.61 Flare1066/M 1066 42 0.82 0.51 11.63 0.20 Flare323/C 323 37 0.72 0.38 1.19 0.12 Flare323/M 323 36 0.80 0.39 1.25 0.21 Mushroom 8124 116 29.48 5.30 1838.36 1.66 Nursery 12960 27 34.24 1.64 1038.31 20.38 Post-operative 90 23 0.33 0.30 0.32 0.08 Tic-tac-toe 950 27 1.03 0.37 5.49 0.20 Pima 768 120 12.2 1 14.1 2.6537 Yeast 1484 120 55 19.1 57.1 26.7132 Machine spec: Pentium IV 1.4GHz PC running Windows XP

Redundant Feature Elimination Redundant Feature Elimination for - PowerPoint PPT Presentation

1 Redundant Feature Elimination Redundant Feature Elimination for Multi-Class Problems for Multi-Class Problems Annalisa Appice, Michelangelo Ceci Dipartimento di Informatica, Universit degli Studi di Bari, Italy Simon Rawles, Peter Flach

P i Paired Redundant IOCs Paired Redundant IOCs d R d d t IOC with Redundant Hardware with

Chapt er 14: Redundant Arit hmet ic Keshab K. Parhi A non-redundant radix-r number has

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Dead Code Elimination & Dead code elimination Constant Propagation Conceptually similar

Second Order Cut-Elimination Mikheil Rukhaia Supervisor: Prof. Alexander Leitsch Introduction

Redundant Via Insertion Redundant Via Insertion with Wire Bending with Wire Bending Kuang-

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Redundant Logic Elimination in Network Functions Bangwen Deng 1 , Wenfei Wu 1 , Linhai Song 2 1:

Machine Independent Code Optimizations Useless Code and Redundant Expression Elimination cs5363

A framework for malaria elimination Dr Pedro Alonso, GMP Director Rationale for new elimination

Image Weather Image Weather 7 Effects Elimination Effects Elimination Abstract Problem

Hepatitis C Elimination in New York State Clifton Garmon, Angie Woody & Mary Taylor from

Decentralization towards elimination Datuk Dr. Muhammad Radzi Abu Hassan, Ministry of Health,

CS3220 Gaussian Elimination and LU Steve Marschner Spring 2010 one step of the elimination

Malaria elimination will require New tools Science and politics of malaria elimination in

.. .F . Orange Cou nty :r: Board of Coun ty Commissioners FIRESPRI NG FUND Mi1ckenz1e

towards an inferential lexicon of event selecting predicates for french Ingrid Falk and Fabienne

Bandwidth Ex Parte Addendum M a y 1 0 , 2 0 1 8 Addendum to Bandwidth FCC Meeting on May 2,

DTG HIV-GRADE update Resistent Intermedire Resistenz Eingeschrnkte Empfindlichkeit

1 | Westpac Group Full Year 2016 Presentation & Investor Discussion Pack Westpac Full Year

2017 FULL YEAR FINANCIAL RESULTS Westpac Banking Corporation ABN 33 007 457 141 Westpac Full

2019 FULL YEAR RESULTS DEBT INVESTOR UPDATE A U S T R A L I A & N E W Z E A L A N D

RESULTS H A L F Y E A R E N D E D 3 1 M A R C H 2 0 1 9 DEBT INVESTOR UPDATE AUSTRALIA

Redundant Feature Elimination Redundant Feature Elimination for - PowerPoint PPT Presentation

1 Redundant Feature Elimination Redundant Feature Elimination for Multi-Class Problems for Multi-Class Problems Annalisa Appice, Michelangelo Ceci Dipartimento di Informatica, Universit degli Studi di Bari, Italy Simon Rawles, Peter Flach

P i Paired Redundant IOCs Paired Redundant IOCs d R d d t IOC with Redundant Hardware with

Chapt er 14: Redundant Arit hmet ic Keshab K. Parhi A non-redundant radix-r number has

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Dead Code Elimination &amp; Dead code elimination Constant Propagation Conceptually similar

Second Order Cut-Elimination Mikheil Rukhaia Supervisor: Prof. Alexander Leitsch Introduction

Redundant Via Insertion Redundant Via Insertion with Wire Bending with Wire Bending Kuang-

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Redundant Logic Elimination in Network Functions Bangwen Deng 1 , Wenfei Wu 1 , Linhai Song 2 1:

Machine Independent Code Optimizations Useless Code and Redundant Expression Elimination cs5363

A framework for malaria elimination Dr Pedro Alonso, GMP Director Rationale for new elimination

Image Weather Image Weather 7 Effects Elimination Effects Elimination Abstract Problem

Hepatitis C Elimination in New York State Clifton Garmon, Angie Woody &amp; Mary Taylor from

Decentralization towards elimination Datuk Dr. Muhammad Radzi Abu Hassan, Ministry of Health,

CS3220 Gaussian Elimination and LU Steve Marschner Spring 2010 one step of the elimination

Malaria elimination will require New tools Science and politics of malaria elimination in

.. .F . Orange Cou nty :r: Board of Coun ty Commissioners FIRESPRI NG FUND Mi1ckenz1e

towards an inferential lexicon of event selecting predicates for french Ingrid Falk and Fabienne

Bandwidth Ex Parte Addendum M a y 1 0 , 2 0 1 8 Addendum to Bandwidth FCC Meeting on May 2,

DTG HIV-GRADE update Resistent Intermedire Resistenz Eingeschrnkte Empfindlichkeit

1 | Westpac Group Full Year 2016 Presentation &amp; Investor Discussion Pack Westpac Full Year

2017 FULL YEAR FINANCIAL RESULTS Westpac Banking Corporation ABN 33 007 457 141 Westpac Full

2019 FULL YEAR RESULTS DEBT INVESTOR UPDATE A U S T R A L I A &amp; N E W Z E A L A N D

RESULTS H A L F Y E A R E N D E D 3 1 M A R C H 2 0 1 9 DEBT INVESTOR UPDATE AUSTRALIA

Dead Code Elimination & Dead code elimination Constant Propagation Conceptually similar

Hepatitis C Elimination in New York State Clifton Garmon, Angie Woody & Mary Taylor from

1 | Westpac Group Full Year 2016 Presentation & Investor Discussion Pack Westpac Full Year

2019 FULL YEAR RESULTS DEBT INVESTOR UPDATE A U S T R A L I A & N E W Z E A L A N D