a general model for olap of complex data
play

A General Model for OLAP of Complex Data Jian Pei State University - PowerPoint PPT Presentation

A General Model for OLAP of Complex Data Jian Pei State University of New York at Buffalo, USA http://www.cse.buffalo.edu/faculty/jianpei/ Outline Motivation GOLAP a general OLAP model Applying GOLAP on complex data


  1. A General Model for OLAP of Complex Data Jian Pei State University of New York at Buffalo, USA http://www.cse.buffalo.edu/faculty/jianpei/

  2. Outline • Motivation • GOLAP – a general OLAP model • Applying GOLAP on complex data • Conclusions Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 2

  3. OLAP on Relational Data Operations: Dimensions Measure Store Product Season Sales -Roll-up S1 P1 Spring 6 -Drill-down S1 P2 Spring 12 -Slice, dice, pivot (rotate) S2 P1 Fall 9 (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9(*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 3

  4. Why OLAP is Desirable? • Multi-level, multi-dimensional summarization – Identify multi-level, multi-dimensional trends, changes and exceptions • Can we conduct OLAP on complex data? – Data types: strings, time series, sequences, XML documents, … – “What are the major patterns among the gene expressions that are similar to the given new sample?” Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 4

  5. Gene Expression Matrix r s i w 11 w 12 w 13 r genes w 21 w 22 w 23 g i w 31 w 32 w 33 Samples/time Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 5

  6. Can We OLAP Gene Expression Data? • Gene expression data – matrices – Oh, it can be treated as a relational table! ☺ • Syntax problem: what should be the measure? – SUM, MAX, MIN, AVG? They do not make sense! � – The patterns are wanted • Semantic problem: what should be the OLAP operations? ��� – What is the meaning by generalizing (roll up) a sample/gene? Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 6

  7. Good News, We Are Not Far Away • Two major issues in defining an OLAP model – How to partition the data into summarization units at various levels? – How to summarize the data? • The summarization units for OLAP should yield to some nice hierarchical structure – What about a lattice? – It’s nice Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 7

  8. GOLAP – A General OLAP Model • Base database – a set of objects • Grouping function – Map a set of query objects in the base database to the smallest summarization unit covering the query set – Containment: a summarization unit is still in the base database – Monotonicity: Q 1 ⊆ Q 2 � g(Q 1 ) ⊆ g(Q 2 ) – Closure: a summarization unit is self-closed Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 8

  9. Grouping Function and Class • Class: a subset of objects S s.t. g(S) = S A larger class A class The whole base database itself is a class Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 9

  10. Grouping Function – Lattice • The classes generated by a grouping function form a lattice • Good news: containment, monotonicity and closure are sufficient to get a nice hierarchical structure! • Member function: from class to the set of members Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 10

  11. Summarization Function • A mapping from a set of objects to a summary – A set of sequences � the sequential patterns – A set of time series � the dominant pattern – A set of XML trees � the frequent subtrees Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 11

  12. OLAP Operations • Given – A grouping function – A summarization function • OLAP operations – Summarize: return the summary of the smallest class covering the query set – Roll up: return the summary of the smallest class covering the query set and the current class – Drill down: return the summary of the smallest class covering the current class except for the query set Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 12

  13. GOLAP Model and Data Warehouse • GOLAP model (g, f) – g – grouping function – f – summarization function • G-warehouse {(c, f(c))} – c is a class • (g 1 , f 1 ) and (g 2 , f 2 ) are two GOLAP models. Then, ((g 1 ,g 2 ), (f 1 ,f 2 )) is also a GOLAP model • GOLAP on relational data is consistent with the traditional OLAP model Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 13

  14. Applying GOLAP on Complex Data • How to find a meaningful grouping function? – Use clusters from hierarchical clustering • What kind of hierarchical clustering can lead to a grouping function in GOLAP? – Each cluster contains a subset of objects – The hierarchy covers every object – The whole set of objects is the root cluster – Ancestor/descendant relation based on containment – For any two clusters c 1 and c 2 , c 1 ∩ c 2 is a cluster if it is not empty Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 14

  15. Fixing the Clustering Methods • Many hierarchical clustering methods, but not all, satisfy the requirements – The requirement “c 1 ∩ c 2 is a cluster” may be violated by some methods • Fix: make the non-empty intersections of clusters as “intermediate clusters” Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 15

  16. GeneXplorer: A GOLAP System • OLAP gene expression time series data • Use a hierarchical clustering – Based on attraction tree – the index structure of G-data warehouse • Coherent patterns as summarization • Basic operations – Roll up – Drill down – Slice Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 16

  17. Towards Interactive Exploration of Gene Expression Patterns • Mine hierarchical clusters of co- expressed genes and coherent patterns Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 17

  18. Indexing Clusters Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 18

  19. Interactive Exploration on Iyer’s Data Set Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 19

  20. Comparison with Other Methods Pattern GeneXplorer(9) Adapt(7) CLICK(7) CAST(9) 1 0.993 0.956 0.884 0.955 2 0.957 0.911 0.991 0.887 3 0.984 0.993 0.994 0.997 4 0.980 0.984 0.883 0.968 5 0.958 0.855 0.868 0.855 6 0.952 0.989 0.970 0.984 7 0.967 0.976 0.990 0.719 8 0.991 0.997 0.914 0.999 9 0.702 0.824 0.844 0.800 10 0.974 0.981 0.976 0.996 Each cell represents the similarity between the pattern reported by different approaches and the corresponding pattern in the ground truth Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 20

  21. Other Features of GeneXplorer • Model adjustment – GOLAP models as plug-ins – User can change the grouping function and summarization function • Gene annotation panel – Link patterns to ground truth from public annotations – Pattern and object visualization Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 21

  22. Conclusions • Problem: how to construct a general model for OLAP on complex data? • Solution: GOLAP – a general model – Consistent with traditional OLAP on relational data – Can handle complex data • A case study: GeneXplorer Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 22

  23. Future Work • Is it necessary to introduce new OLAP operations for complex data? – Data/application oriented or general? • Efficient implementation of G-warehouse • Data integration based on general OLAP on complex data Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 23

  24. Thank You! http://www.cse.buffalo.edu/faculty/jianpei/ Jian Pei: Mining Phenotypes and Pattern-based Clusters from Microarray Data 24

Recommend


More recommend