CMPT741/459 Assignments Tutorial Zicun Cong Yu Yang October 30, 2015
A1Q3 Generator: (1) sup ( I ) ≥ min sup (2) ∄ I ′ , I ′ ⊂ I ∧ sup ( I ′ ) = sup ( I ) (a) Relation between closed patterns and generators (b) Algorithm for finding all generators
A1Q3: Relation ◮ Equivalent class: a set of itemsets contained by the same set of transactions ◮ One closed pattern (proof by contradiction) ◮ Multiple generators “Minimum Description Length (MDL) Principle: Generators Are Preferable to Closed Patterns” , AAAI 2006
A1Q3: Algorithm ◮ Apriori Property ◮ f D ( I ): transactions containing I ◮ Suppose I 1 ⊂ I is not a generator ◮ I 2 ⊂ I 1 , f D ( I 2 ) = f D ( I 1 ) ◮ I 2 ∪ ( I \ I 1 ) ⊂ I ◮ f D ( I ) = f D ( I 1 ) ∩ f D ( I \ I 1 ) = f D ( I 2 ) ∩ f D ( I \ I 1 ) = f D ( I 2 ∪ ( I \ I 1 ))
A1Q4 Misleading Rule An association rule ( X → Y ) is a misleading rule if sup ( X ∪ Y ) ≥ α and β ≤ conf ( X → Y ) < sup ( Y ) | T D | , where | T D | is the total number of transactions.
A2Q2 (a) Give a small example where there are one fact table and two dimension tables. (b) Compute an iceberg cube where the aggregate function is monotonic using the universal table. (c) Identify and reduce redundancy.
A2Q2(a) Figure: A Star Schema
A2Q2(b) Figure: Universal Table
A2Q2(b) Figure: BUC, min sup = 3
A2Q2(c) ◮ Storage Redundancy: values of non-primary keys in a dimension table is only decided by the value of primary key in the same dimension table ◮ Computation Redundancy: repeatedly search the same portion of the universal table (e.g. c 1 d 1 e 1 and c 1 d 1 f 1 )
A2Q2(c): Reduce Redundancy ◮ sup ( c ): only decided by searching the fact table ◮ Dimension table: only the primary key is useful in searching the fact table ◮ Idea: find local iceberg cells on dimension tables first, index local icebergs by their corresponding primary keys (signature).
A2Q2(c): Algorithm ◮ Propagate information to dimension tables ◮ Local icebergs: sum ( Count ) ≥ 3 Figure: A Star Schema
A2Q2(c): Algorithm ◮ Equivalent class: set of local icebergs with the same signature (e.g. e 1 , f 1 and e 1 f 1 ) ◮ Join signatures from different dimension tables to obtain global iceberg cells “Cross Table Cubing: Mining Iceberg Cubes from Data Warehouses”, SDM 2005
A3Q2 Give a counter example that KMeans cannot get the optimal clustering w.r.t. � o ∈ D dist ( o , c o ) ◮ Find an example that has two stable clustering with different loss
A3Q2 Figure: A Counter Example
A3Q5 Show that I × J is a bicluster with coherent values iff. for any i 1 , i 2 ∈ I , j 1 , j 2 ∈ J , e i 1 j 1 − e i 2 j 1 = e i 1 j 2 − e i 2 j 2 .
A3Q5: Necessity ◮ Bicluster: for any i ∈ I and any j ∈ J , e ij = c + α i + β j ◮ e i 1 j 1 − e i 2 j 1 = c + α i 1 + β j 1 − c − α i 2 − β j 1 = α i 1 − α i 2 ◮ e i 1 j 2 − e i 2 j 2 = c + α i 1 + β j 2 − c − α i 2 − β j 2 = α i 1 − α i 2 = e i 1 j 1 − e i 2 j 1
A3Q5: Sufficiency � j ∈ J e ij � i ∈ I e ij ◮ e iJ = , e Ij = , | J | | I | � � j ∈ J e ij i ∈ I e IJ = | I || J | ◮ e ij − e iJ − e Ij + e IJ = 0 ◮ Due to e i 1 j 1 − e i 2 j 1 = e i 1 j 2 − e i 2 j 2 ◮ e ij = e iJ ( α i ) + e Ij ( β j ) − e IJ ( − c )
Recommend
More recommend