aggregation and selection in relational data mining
play

Aggregation and Selection in Relational Data Mining Celine Vens - PowerPoint PPT Presentation

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Aggregation and Selection in Relational Data Mining Celine Vens Anneleen Van Assche Hendrik Blockeel Sa so D


  1. Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Aggregation and Selection in Relational Data Mining Celine Vens Anneleen Van Assche Hendrik Blockeel Saˇ so Dˇ zeroski Department of Computer Science - K.U.Leuven Department of Knowledge Technologies - Jozef Stefan Institute, Slovenia C. Vens, A. Van Assche, H. Blockeel, S.Dˇ zeroski Aggregation and Selection in Relational Data Mining

  2. Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Outline ◮ Introduction ◮ Aggregation and Selection ◮ Relational Decision Trees ◮ Relational Random Forests ◮ Experimental Results ◮ Conclusions C. Vens, A. Van Assche, H. Blockeel, S.Dˇ zeroski Aggregation and Selection in Relational Data Mining

  3. Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Relational Data Mining ◮ Data Mining: searching for patterns in (large) databases. ◮ Propositional (Classical) Data Mining: ◮ data is stored in single table ◮ patterns involve intra-tuple relations ◮ Relational Data Mining: ◮ data is stored in multiple tables (relational database) ◮ patterns involve inter-tuple or inter-table relations ◮ how to deal with 1-n or m-n relations (sets)? C. Vens, A. Van Assche, H. Blockeel, S.Dˇ zeroski Aggregation and Selection in Relational Data Mining

  4. Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Working Example Current relational learners : 2 approaches to dealing with sets C. Vens, A. Van Assche, H. Blockeel, S.Dˇ zeroski Aggregation and Selection in Relational Data Mining

  5. Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Outline ◮ Introduction ◮ Aggregation and Selection ◮ Relational Decision Trees ◮ Relational Random Forests ◮ Experimental Results ◮ Conclusions C. Vens, A. Van Assche, H. Blockeel, S.Dˇ zeroski Aggregation and Selection in Relational Data Mining

  6. Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions First approach: Aggregation ◮ Use SQL-like aggregation to summarize set in one big table ◮ Apply classical data mining technique (e.g. decision tree inducer) ◮ Optimized for highly non-determinate domains C. Vens, A. Van Assche, H. Blockeel, S.Dˇ zeroski Aggregation and Selection in Relational Data Mining

  7. Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Second approach: Selection ◮ Apply relational data mining technique (e.g. relational decision tree inducer) ◮ Test for existence of specific elements in the set ◮ Optimized for structurally complex domains ◮ e.g. ILP: Inductive Logic Programming ◮ database and patterns in Prolog ◮ possibility to add background knowledge C. Vens, A. Van Assche, H. Blockeel, S.Dˇ zeroski Aggregation and Selection in Relational Data Mining

  8. Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Example concepts 1. Persons that have two books. 2. Persons that have a computer book. 3. Persons that have two computer books. How to express concept 3?? ◮ Selective methods need aggregate function in background knowledge. ◮ Aggregating methods need separate relation for each genre. Solution: combine aggregation and selection in context of relational data mining C. Vens, A. Van Assche, H. Blockeel, S.Dˇ zeroski Aggregation and Selection in Relational Data Mining

  9. Introduction Aggregation and Selection Decision Trees Relational Decision Trees Relational Decision Trees Relational Random Forests Combining selection and aggregation Experimental Results Conclusions Outline ◮ Introduction ◮ Aggregation and Selection ◮ Relational Decision Trees ◮ Decision Trees ◮ Relational Decision Trees ◮ Combining Aggregation and Selection ◮ Relational Random Forests ◮ Experimental Results ◮ Conclusions C. Vens, A. Van Assche, H. Blockeel, S.Dˇ zeroski Aggregation and Selection in Relational Data Mining

  10. Introduction Aggregation and Selection Decision Trees Relational Decision Trees Relational Decision Trees Relational Random Forests Combining selection and aggregation Experimental Results Conclusions Decision Trees ◮ One of the most widely used and practical data mining methods ◮ Each internal node contains a test on some attribute ◮ Each leaf contains a prediction ◮ Classification of new instance C. Vens, A. Van Assche, H. Blockeel, S.Dˇ zeroski Aggregation and Selection in Relational Data Mining

  11. Introduction Aggregation and Selection Decision Trees Relational Decision Trees Relational Decision Trees Relational Random Forests Combining selection and aggregation Experimental Results Conclusions Decision Trees: learning them ◮ Divide & conquer algorithm ◮ Pseudocode: grow node ( Node,Examples ): IF stopcriterium: assign majority class from Examples to Node ELSE generate all possible tests for Node associate best test with Node grow two childnodes Left and Right split Examples into ExamplesPass and ExamplesFail grow node ( Left,ExamplesPass ) grow node ( Right,ExamplesFail ) C. Vens, A. Van Assche, H. Blockeel, S.Dˇ zeroski Aggregation and Selection in Relational Data Mining

  12. Introduction Aggregation and Selection Decision Trees Relational Decision Trees Relational Decision Trees Relational Random Forests Combining selection and aggregation Experimental Results Conclusions Relational Decision Trees: learning them ◮ Upgrade of classical algorithm: Tilde [Blockeel and De Raedt ’98] ◮ Trees are relational: contain first order logic literals in test of internal node ◮ Selective approach (ILP) ◮ Tests can introduce variables : possible tests may differ at each node C. Vens, A. Van Assche, H. Blockeel, S.Dˇ zeroski Aggregation and Selection in Relational Data Mining

  13. Introduction Aggregation and Selection Decision Trees Relational Decision Trees Relational Decision Trees Relational Random Forests Combining selection and aggregation Experimental Results Conclusions Adding aggregation ◮ User specifies basic components: aggregate functions, sets to be aggregated, query to generate set to be aggregated ◮ Aggregate conditions are created, using discretization ◮ Aggregate conditions are added to the set of possible tests C. Vens, A. Van Assche, H. Blockeel, S.Dˇ zeroski Aggregation and Selection in Relational Data Mining

  14. Introduction Aggregation and Selection Decision Trees Relational Decision Trees Relational Decision Trees Relational Random Forests Combining selection and aggregation Experimental Results Conclusions Adding selections to aggregation: first manner ◮ If a node contains an aggregation, any node in its left subtree can add a selection within that aggregate condition ◮ Local search within aggregate condition C. Vens, A. Van Assche, H. Blockeel, S.Dˇ zeroski Aggregation and Selection in Relational Data Mining

  15. Introduction Aggregation and Selection Decision Trees Relational Decision Trees Relational Decision Trees Relational Random Forests Combining selection and aggregation Experimental Results Conclusions Adding selections to aggregation: second manner ◮ Lookahead ◮ technique to look ahead in refinement lattice ◮ add several literals at once ◮ computationally expensive C. Vens, A. Van Assche, H. Blockeel, S.Dˇ zeroski Aggregation and Selection in Relational Data Mining

  16. Introduction Aggregation and Selection Decision Trees Relational Decision Trees Relational Decision Trees Relational Random Forests Combining selection and aggregation Experimental Results Conclusions Relational Decision Trees with aggregation and selection: learning them ◮ Pseudocode: grow node ( Node,Examples ): IF stopcriterium: assign majority class from Examples to Node ELSE generate all possible first order tests for Node : usual tests aggregate functions refinement of aggregate function higher in tree associate best test with Node grow two childnodes Left and Right split Examples into ExamplesPass and ExamplesFail grow node ( Left,ExamplesPass ) grow node ( Right,ExamplesFail ) C. Vens, A. Van Assche, H. Blockeel, S.Dˇ zeroski Aggregation and Selection in Relational Data Mining

  17. Introduction Aggregation and Selection Decision Trees Relational Decision Trees Relational Decision Trees Relational Random Forests Combining selection and aggregation Experimental Results Conclusions Relational Decision Trees with aggregation and selection: problem ◮ Number of tests at each node in the tree grows very fast ◮ Need some way to deal with it ◮ Make use of technique from classical data mining: Random Forests [Breiman ’01] C. Vens, A. Van Assche, H. Blockeel, S.Dˇ zeroski Aggregation and Selection in Relational Data Mining

  18. Introduction Aggregation and Selection Relational Decision Trees Random Forests Relational Random Forests Relational Random Forests Experimental Results Conclusions Outline ◮ Introduction ◮ Aggregation and Selection ◮ Relational Decision Trees ◮ Relational Random Forests ◮ Random Forests ◮ Relational Random Forests ◮ Experimental Results ◮ Conclusions C. Vens, A. Van Assche, H. Blockeel, S.Dˇ zeroski Aggregation and Selection in Relational Data Mining

Recommend


More recommend