Boolean Factor Analysis of Multi-Relational Data Marketa Krmelova, Martin Trnecka Palacky University, Olomouc, Czech Republic ! ! ! Krmelova M., Trnecka M. (DAMOL) Boolean Factor Analysis of Multi-Relational Data October 16, 2013 1 / 21
Motivation The Boolean factor analysis (BFA) is an established method for analysis and preprocessing of Boolean data. The basic task in the BFA: find new variables (factors) which explain or describe original single input data. Finding factors is an important step for understanding and managing data. Boolean nature of data is in this case beneficial especially from the standpoint of interpretability of the results. BFA is suitable for single input Boolean data table with just one relation between objects and attributes. Many real-world data sets are more complex than a simple data table. We propose new approach to the BFA, which is tailored for multi-relational data. Krmelova M., Trnecka M. (DAMOL) Boolean Factor Analysis of Multi-Relational Data October 16, 2013 2 / 21
Multi-Relational Data Usually, they are composed from many data tables, which are interconnected by relations. Relations are crucial. Represent additional information about the relationship between data tables. This information is important for understanding data as a whole. Example: Social networks, Dating agency database. Krmelova M., Trnecka M. (DAMOL) Boolean Factor Analysis of Multi-Relational Data October 16, 2013 3 / 21
Boolean Factor Analysis Consider an n × m object-attribute matrix C with entries C ij ∈ { 0 , 1 } expressing whether an object i has an attribute j or not. The goal of the BMF is to find decomposition C = A ◦ B of C into a product of an n × k object-factor matrix A over { 0 , 1 } , a k × m factor-attribute matrix B over { 0 , 1 } . The product ◦ in (4) is a Boolean matrix product, defined by ( A ◦ B ) ij = � k l =1 A il · B lj , where � denotes maximum (truth function of logical disjunction) and · is the usual product (truth function of logical conjunction). For example the following matrix can be decomposed into two Boolean matrices with k < m . 1 1 0 0 1 � 1 � 0 1 = ◦ 1 1 1 1 1 1 1 0 1 0 1 1 0 Krmelova M., Trnecka M. (DAMOL) Boolean Factor Analysis of Multi-Relational Data October 16, 2013 4 / 21
Boolean Factor Analysis via FCA An optimal decomposition of the Boolean matrix can be found via FCA Factors are represented by formal concepts. The aim is to decompose the matrix C into a product A F ◦ B F . F = {� A 1 , B 1 � , . . . , � A k , B k �} ⊆ B ( X, Y, C ) , where B ( X, Y, C ) represents set of all formal concepts of context � X, Y, C � . Denote by A F and B F the n × k and k × m binary matrices defined by � 1 if i ∈ A l � 1 if j ∈ B l ( A F ) il = ( B F ) lj = 0 if i / ∈ A l 0 if j / ∈ B l for l = 1 , . . . , k . In other words, A F is composed from characteristic vectors A l . Similarly for B F . The set of factors is a set F of formal concepts of � X, Y, C � , for which holds C = A F ◦ B F . For every C such a set always exists. Because a factor can be seen as a formal concept, we can consider the intent part (denoted by intent ( F ) ) and the extent part (denoted by extent ( F ) ) of the factor F . Krmelova M., Trnecka M. (DAMOL) Boolean Factor Analysis of Multi-Relational Data October 16, 2013 5 / 21
Boolean Factor Analysis of Multi-Relational Data Our settings: We have two Boolean data tables C 1 and C 2 , which are interconnected with relation R C 1 C 2 . This relation is over the objects of first data table C 1 and the attributes of second data table C 2 , i.e. it is an objects-attributes relation. In general, we can also define an objects-objects relation or an attributes-attributes relation. Our goal: is to find factors, which explain the original data and which take into account the relation R C 1 C 2 between data tables. Definition � � 1 , F j F i , where F i 1 ∈ F 1 Relation factor (pair factor) on data tables C 1 and C 2 is a pair 2 and F j 2 ∈ F 2 ( F i denotes set of factors of data table C i ) and satisfying relation R C 1 C 2 . There are several ways how to define the meaning of “satisfying relation” from Definition. Krmelova M., Trnecka M. (DAMOL) Boolean Factor Analysis of Multi-Relational Data October 16, 2013 6 / 21
Narrow Approach 1 and F j 1 , F j F i 2 form pair factor � F i 2 � if holds: � � R k ⊆ intent ( F j R k � = ∅ and 2 ) , k ∈ extent ( F i k ∈ extent ( F i 1 ) 1 ) where R k is a set of attributes, which are in relation with an object k . This definition holds for an object-attribute relation, other types of relations can be defined in similar way. Krmelova M., Trnecka M. (DAMOL) Boolean Factor Analysis of Multi-Relational Data October 16, 2013 7 / 21
Wide Approach 1 and F j 1 , F j F i 2 form pair factor � F i 2 � if holds: � ∩ intent ( F j � = ∅ . R k 1 ) k ∈ extent ( F i 1 ) This definition holds for an object-attribute relation, other types of relations can be defined in similar way. Krmelova M., Trnecka M. (DAMOL) Boolean Factor Analysis of Multi-Relational Data October 16, 2013 8 / 21
α -approach 1 and F j 1 , F j For any α ∈ [0 , 1] , F i 2 form pair factor � F i 2 � if holds: � � �� � ∩ intent ( F j 1 ) R k 2 ) � � k ∈ extent ( F i � � ≥ α. � � �� 1 ) R k � � k ∈ extent ( F i � This definition holds for an object-attribute relation, other types of relations can be defined in similar way. It is obvious, that for α = 0 and replacing ≥ by > , we get the wide approach and for α = 1 , we get the narrow one. Lemma For α 1 > α 2 holds, that a set of relation factors counted by α 1 is a subset of a set of relation factors obtained with α 2 . Krmelova M., Trnecka M. (DAMOL) Boolean Factor Analysis of Multi-Relational Data October 16, 2013 9 / 21
Simple Example Let us have two data tables C W and C M . C W represents women and their characteristics and C M represents men and their characteristics. Table : C W Table : C M Table : R C W C M undergraduate undergraduate undergraduate is attractive is attractive is attractive wants kids wants kids wants kids athlete athlete athlete Abby Adam Abby × × × × × × × Becky Ben Becky × × × × × × Claire Carl Claire × × × × × × × × Daphne Dave Daphne × × × × × × × × × × Moreover, we consider relation R C W C M between the objects of first the data table and the attributes of the second data table. In this case, it could be a relation with meaning “woman looking for a man with the characteristics”. Krmelova M., Trnecka M. (DAMOL) Boolean Factor Analysis of Multi-Relational Data October 16, 2013 10 / 21
Factors Obtained via GreConD Algorithm Factors of data table C W are: - F W = �{ Abby, Daphne } , { undergraduate, wants kids, is attractive }� 1 - F W = �{ Becky, Daphne } , { athlete, wants kids }� 2 - F W = �{ Abby, Claire, Daphne } , { undergraduate, is attractive }� 3 Factors of data table C M are: - F M = �{ Ben, Carl } , { undergraduate, wants kids }� 1 - F M = �{ Adam } , { athlete, is attractive }� 2 - F M = �{ Adam, Carl } , { athlete }� 3 - F M = �{ Dave } , { wants kids, is attractive }� 4 Krmelova M., Trnecka M. (DAMOL) Boolean Factor Analysis of Multi-Relational Data October 16, 2013 11 / 21
Joint Factors Into Relational Factors We use so far unused relation R C W C M , between C W and C M to joint factors of C W with factors of C M into relational factors. For the above defined approaches we get W and F j results which are shown below. We write it as binary relations, i.e F i M belongs W , F j W and F j to relational factor � F i M � iff F i M are in relation: Narrow approach Wide approach F 1 F 2 F 3 F 4 F 1 F 2 F 3 F 4 M M M M M M M M F 1 F 1 × × × W W F 2 F 2 × × × × W W F 3 F 3 × × W W 0 . 6 -approach 0 . 5 -approach F 1 F 2 F 3 F 4 F 1 F 2 F 3 F 4 M M M M M M M M F 1 F 1 × × × W W F 2 F 2 × × W W F 3 F 3 × × W W Krmelova M., Trnecka M. (DAMOL) Boolean Factor Analysis of Multi-Relational Data October 16, 2013 12 / 21
Interpretation W , F j The relational factor in form � F i M � can be interpreted in the following ways: W like men who belong to extent of F j Women, who belong to extent of F i M . Specifically in this example, we can interpret factor � F 1 W , F 1 M � , that Abby and Daphne should like Ben and Carl. W like men with characteristic in intent of F j Women, who belong to extent of F i M . Specifically in this example, we can interpret factor � F 1 W , F 1 M � , that Abby and Daphne should like undergraduate men, who want kids. W like men who belong to extent F j Women, with characteristic from intent F i M . Specifically in this example, we can interpret factor � F 1 W , F 1 M � , that undergraduate, attractive women, who want kids should like Ben and Carl. Women, with characteristic from intent F i W like men with characteristic in intent of F j M . Specifically in this example, we can interpret factor � F 1 W , F 1 M � , that undergraduate, attractive women, who want kids should like undergraduate men, who want kids. Krmelova M., Trnecka M. (DAMOL) Boolean Factor Analysis of Multi-Relational Data October 16, 2013 13 / 21
Recommend
More recommend