formal methods for mining structured objects
play

Formal Methods for Mining Structured Objects Gemma Casas Garriga - PowerPoint PPT Presentation

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Formal Methods for Mining Structured Objects Gemma Casas Garriga Ph.D. Software Program Universitat Polit` ecnica de Catalunya Ph.D. dissertation, 8 June 2006


  1. Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Formal Methods for Mining Structured Objects Gemma Casas Garriga Ph.D. Software Program Universitat Polit` ecnica de Catalunya Ph.D. dissertation, 8 June 2006 Advised by Jos´ e L. Balc´ azar Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

  2. Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Outline 1 Introduction: Mining Structured Data 2 Lattice Theory for Sequences 3 Horn Axiomatizations for Sequences 4 Partial Order Construction 5 Conclusions and Future Research Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

  3. Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Outline 1 Introduction: Mining Structured Data 2 Lattice Theory for Sequences 3 Horn Axiomatizations for Sequences 4 Partial Order Construction 5 Conclusions and Future Research Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

  4. Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Preliminaries Preliminaries Data description Consider the universe of patterns . Patterns are subclasses of directed graphs. A F B D A B F A A B F D B B B F D Consider a set of data D = { d 1 , . . . , d n } , where each object d i can be described by a pattern. Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

  5. Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Preliminaries Preliminaries Some definitions Given two patterns G and G ′ , we say that G is a subpattern of G ′ , denoted G � G ′ , if there is a morphism from G to G ′ . A C A C C B A C B A C � The subpattern relation � defines a partial order organization on the set of patterns, namely an exponential lattice of patterns. A B C D A B C A B D A C D B C D A B A C A D B C B D C D A B C D { } Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

  6. Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Preliminaries Preliminaries Some definitions If G � G ′ , we say that G is more general than G ′ , or G ′ is more specific than G . A B C D A B C A B D A C D B C D A B A C A D B C B D C D A B C D { } Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

  7. Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions The basic problem The basic problem Mining descriptions on the data The complete lattice of patterns organized by � defines the pattern space from where to identify valid hypothesis for our data D . A B C D A B C A B D A C D B C D D = { d 1 , . . . , d n } A B A C A D B C B D C D A B C D { } The support of a pattern G in the data D is the number of instances d ∈ D s.t. G � d . Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

  8. Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions The basic problem The basic problem Mining descriptions on the data The complete lattice of patterns organized by � defines the pattern space from where to identify valid hypothesis for our data D . A B C D A B C A B D A C D B C D D = { d 1 , . . . , d n } A B A C A D B C B D C D A B C D { } The support of a pattern G in the data D is the number of instances d ∈ D s.t. G � d . Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

  9. Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Examples Example. The itemset case Data D does not exhibit any structure. A well-known problem is to find all patterns whose support in D is over a minimum specified threshold, namely frequent itemsets. The complete pattern space is formed only by trivial orders. A B C D A B C A B D A C D B C D Id Sets { A , B , C , D } d 1 A B A C A D B C B D C D d 2 { A , C , D } { A , B , D } d 3 { A , C } d 4 A B C D { } Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

  10. Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Examples Example. The sequential case Data D corresponds to sequences of itemsets. In this case the pattern space is formed by general partial orders. Sequences are often used within the data mining community as a prototypical example of structured domain. AE C D A D ABE F BCD D A B F Seq id Input sequences � ( AE )( C )( D )( A ) � d 1 A B D d 2 � ( D )( ABE )( F )( BCD ) � B F � ( D )( A )( B )( F ) � d 3 A D B D B F Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

  11. Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Observations Observations The pattern space defined by � grows combinatorically with the structure exhibited by data D . Moreover, many patterns may be redundant to describe D . AE C D A D ABE F BCD D A B F A B D B F A D B D B F Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

  12. Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Formal Concept Analysis Lattice theory Compacting the exponential lattice of patterns Formal Concept Analsis (FCA) is employed for compacting all the relationships of binary data into a Galois lattice without information loss. The final Galois lattice is a closure system. The good structural properties of the Galois lattice define a connection with the classical propositional Horn theory. An important limitation of this approach is that the classical propositional description is unable to reflect any structure. Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

  13. Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Formal Concept Analysis Lattice theory Compacting the exponential lattice of patterns Formal Concept Analsis (FCA) is employed for compacting all the relationships of binary data into a Galois lattice without information loss. The final Galois lattice is a closure system. The good structural properties of the Galois lattice define a connection with the classical propositional Horn theory. An important limitation of this approach is that the classical propositional description is unable to reflect any structure. Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

  14. Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions This thesis This thesis Studies the properties of the closure system obtained when mining structured objects in the form of sequences. Provides a theoretical basis on: ⋆ Horn theory for ordered models ⋆ Partial order construction from sequences ⋆ Lattice theory for partial order structures Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

  15. Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Outline 1 Introduction: Mining Structured Data 2 Lattice Theory for Sequences 3 Horn Axiomatizations for Sequences 4 Partial Order Construction 5 Conclusions and Future Research Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

  16. Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Formal Concept Analysis for sequences Ordered data Seq id Input sequences � ( AE )( C )( D )( A ) � d 1 d 2 � ( D )( ABE )( F )( BCD ) � d 3 � ( D )( A )( B )( F ) � Observation The intersection of a collection of sequences returns a set of sequences. Example The intersection of � ( AD )( C )( B ) � and � ( A )( B )( D )( C ) � is the set of sequences { � ( A )( C ) � , � ( A )( B ) � , � ( D )( C ) � } . Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

  17. Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Formal Concept Analysis for sequences Derivation operators We define the following two operators φ and ψ : For a set O ⊆ O of objects: φ ( O ) = { s ∈ S| s maximally contained in d i , for all i ∈ O } . Correspondingly, for a set S ⊆ S of sequences: ψ ( S ) = { i ∈ O| s ⊆ d i , for all s ∈ S } . Example Seq id Input sequences � ( AE )( C )( D )( A ) � d 1 � ( D )( ABE )( F )( BCD ) � d 2 � ( D )( A )( B )( F ) � d 3 ⊲ φ ( { 1, 3 } ) = { � ( D )( A ) � } ⊲ ψ ( { � ( AE )( D ) � , � ( AE )( C ) � } ) = { 1, 2 } Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

Recommend


More recommend