cmpt741 459 assignments tutorial
play

CMPT741/459 Assignments Tutorial Zicun Cong Yu Yang October 30, - PowerPoint PPT Presentation

CMPT741/459 Assignments Tutorial Zicun Cong Yu Yang October 30, 2015 A1Q3 Generator: (1) sup ( I ) min sup (2) I , I I sup ( I ) = sup ( I ) (a) Relation between closed patterns and generators (b) Algorithm for


  1. CMPT741/459 Assignments Tutorial Zicun Cong Yu Yang October 30, 2015

  2. A1Q3 Generator: (1) sup ( I ) ≥ min sup (2) ∄ I ′ , I ′ ⊂ I ∧ sup ( I ′ ) = sup ( I ) (a) Relation between closed patterns and generators (b) Algorithm for finding all generators

  3. A1Q3: Relation ◮ Equivalent class: a set of itemsets contained by the same set of transactions ◮ One closed pattern (proof by contradiction) ◮ Multiple generators “Minimum Description Length (MDL) Principle: Generators Are Preferable to Closed Patterns” , AAAI 2006

  4. A1Q3: Algorithm ◮ Apriori Property ◮ f D ( I ): transactions containing I ◮ Suppose I 1 ⊂ I is not a generator ◮ I 2 ⊂ I 1 , f D ( I 2 ) = f D ( I 1 ) ◮ I 2 ∪ ( I \ I 1 ) ⊂ I ◮ f D ( I ) = f D ( I 1 ) ∩ f D ( I \ I 1 ) = f D ( I 2 ) ∩ f D ( I \ I 1 ) = f D ( I 2 ∪ ( I \ I 1 ))

  5. A1Q4 Misleading Rule An association rule ( X → Y ) is a misleading rule if sup ( X ∪ Y ) ≥ α and β ≤ conf ( X → Y ) < sup ( Y ) | T D | , where | T D | is the total number of transactions.

  6. A2Q2 (a) Give a small example where there are one fact table and two dimension tables. (b) Compute an iceberg cube where the aggregate function is monotonic using the universal table. (c) Identify and reduce redundancy.

  7. A2Q2(a) Figure: A Star Schema

  8. A2Q2(b) Figure: Universal Table

  9. A2Q2(b) Figure: BUC, min sup = 3

  10. A2Q2(c) ◮ Storage Redundancy: values of non-primary keys in a dimension table is only decided by the value of primary key in the same dimension table ◮ Computation Redundancy: repeatedly search the same portion of the universal table (e.g. c 1 d 1 e 1 and c 1 d 1 f 1 )

  11. A2Q2(c): Reduce Redundancy ◮ sup ( c ): only decided by searching the fact table ◮ Dimension table: only the primary key is useful in searching the fact table ◮ Idea: find local iceberg cells on dimension tables first, index local icebergs by their corresponding primary keys (signature).

  12. A2Q2(c): Algorithm ◮ Propagate information to dimension tables ◮ Local icebergs: sum ( Count ) ≥ 3 Figure: A Star Schema

  13. A2Q2(c): Algorithm ◮ Equivalent class: set of local icebergs with the same signature (e.g. e 1 , f 1 and e 1 f 1 ) ◮ Join signatures from different dimension tables to obtain global iceberg cells “Cross Table Cubing: Mining Iceberg Cubes from Data Warehouses”, SDM 2005

  14. A3Q2 Give a counter example that KMeans cannot get the optimal clustering w.r.t. � o ∈ D dist ( o , c o ) ◮ Find an example that has two stable clustering with different loss

  15. A3Q2 Figure: A Counter Example

  16. A3Q5 Show that I × J is a bicluster with coherent values iff. for any i 1 , i 2 ∈ I , j 1 , j 2 ∈ J , e i 1 j 1 − e i 2 j 1 = e i 1 j 2 − e i 2 j 2 .

  17. A3Q5: Necessity ◮ Bicluster: for any i ∈ I and any j ∈ J , e ij = c + α i + β j ◮ e i 1 j 1 − e i 2 j 1 = c + α i 1 + β j 1 − c − α i 2 − β j 1 = α i 1 − α i 2 ◮ e i 1 j 2 − e i 2 j 2 = c + α i 1 + β j 2 − c − α i 2 − β j 2 = α i 1 − α i 2 = e i 1 j 1 − e i 2 j 1

  18. A3Q5: Sufficiency � j ∈ J e ij � i ∈ I e ij ◮ e iJ = , e Ij = , | J | | I | � � j ∈ J e ij i ∈ I e IJ = | I || J | ◮ e ij − e iJ − e Ij + e IJ = 0 ◮ Due to e i 1 j 1 − e i 2 j 1 = e i 1 j 2 − e i 2 j 2 ◮ e ij = e iJ ( α i ) + e Ij ( β j ) − e IJ ( − c )

Recommend


More recommend