outline
play

Outline What is de c ision suppor t An Overview of Data What is a - PDF document

Outline What is de c ision suppor t An Overview of Data What is a data war e house Warehousing and OLAP Why we ne e d it, and how it diffe r s fr om a re gular RDBMS Technology Diffe r e nc e be twe e n OL AP and OL T


  1. Outline • What is de c ision suppor t An Overview of Data • What is a data war e house Warehousing and OLAP • Why we ne e d it, and how it diffe r s fr om a re gular RDBMS Technology • Diffe r e nc e be twe e n OL AP and OL T P • T ypic al OL AP ar c hite c tur e • Database De sign Me thodology • Star and Snowflake sc he ma s Pre se nte r: Otto • Challe nge s of mate r ialize d vie ws Disc ussion: Jim • Imple me ntation of the OL AP Se r ve r Ma r. 07 2006 • Me tadata r e quir e me nts 2 Motive for a Data Warehouse What is decision support • Busine sse s ha ve a lot of data, ope r ational data and fac ts. • De c ision suppor t syste ms are a c lass of • T his da ta is usually in diffe re nt da ta ba se s c ompute r ize d infor mation syste ms that and in diffe r e nt physic al plac e s. suppor t de c ision making ac tivitie s. • De c ision support syste ms usually re quire • De c ision make r c onsolidating data form ma ny s ne e d to ac c e ss he te r oge ne ous sour c e s: the se might infor mation (data that ha s be e n inc lude e xte r nal sour c e s. summar ize d) vir tually on the single site . • T -Suc h us stoc k mar ke t fe e ds. his ac c e ss ne e ds to be fa st re gar dle ss of the size of the data, and how old the data is. 3 4 What is data warehouse Difference between OLAP and OLTP OL T P OL AP • Data war e housing pr ovide s ar c hite c tur e s Cle r k, IT pr ofe ssional Knowle dg e wor ke r Use r s and tools for busine ss e xe c utive s to Day to day ope rations De c ision suppor t F unc tion syste matic ally or ganize , unde r stand a nd DB De sign Applic ation- or ie nte d Subje c t-or ie nte d use the ir data to make str ate gic Cur re nt, up-to-date Histor ic al, summar ize d, de c isions. – Jiawe i Han Data de taile d. multidime nsional,… r e pe titive Ad- hoc • A data war Usage e house is a subje c t- or ie nte d, Ac c e ss Re ad/ wr ite L ots of sc a ns inte gr ate d, time - var iant, a nd non-vola tile Shor t, simple Comple x que r y Unit of work c olle c tion of data in suppor t of tr ansa c tion manage me nt’s de c ision ma king pr oc e ss. te ns Millions # re c ac c e sse d thousands Hundr e ds # use rs 100 MB-GB 100 GB- T B DB size Data from different data Changes as new Major subject Retains the T ransac tion throughput Que r y thr oug hput Me tr ic sources. data trickle in areas history 5 6 1

  2. Why Do we Separate DW From DB ? Typical OLAP architecture • Pe r for manc e re asons: – OL AP r e quir e s spe c ia l data organization that suppor ts multidime nsional vie ws. – OL AP que rie s would de grade ope r ationa l DB. – OL AP is r e ad only. – No c onc urre nc y c ontrol a nd re c ove ry. • De c ision suppor t r e quir e s histor ic al data. • De c ision suppor t r e quir e s c onsolidate d data. 7 8 Database Design Methodology Example of Star Schema • Most data wa re house s use a star sc he ma to re pre se nt the multi-dime nsional mode l. • E ac h dime nsion is r e pr e se nte d by a dime nsion-table that de sc r ibe s it. • A fac t- table c onne c ts to all dime nsion- table s with a multiple join. E ac h tuple in the fac t-table c onsists of a pointe r to e ac h of the dime nsion-table s. • T he links be twe e n the fac t-table in the c e ntr e and the dime nsion-table s form a shape like a star. (Star Sc he ma) 9 10 Database Design Methodology (con ’ t) Example of Snowflake Schema • E ac h dime nsion is r e pr e se nte d by one table . � Un-nor malize d (intr oduc e s r e dundanc y) E x: (Vanc ouve r, BC, Cana da, North Ame r ic a) (Vic toria , BC, Ca nada, North Ame r ic a) malize dime nsion table s � Nor Snowflake Sc he ma 11 12 2

  3. What I s the Best Design ?? Challenges of Materialized views Pe r for manc e be nc hmar king c an be T he c halle nge s in e xploiting mate r ialize d de te r mine what is the be st de sign. vie ws a re like in using indic e s. – Ide ntify the vie ws to mate r ialize . – E xploit the mate r ialize d vie ws to answe r que r ie s. • Snowflake sc he ma: E a sie r to maintain – E ffic ie ntly upda te the ma te r ia lize d vie ws during loa d dime nsion table s whe n dime nsion table and r e fr e sh. ar e ve r y lar ge (r e duc e s ove r all spac e ). T he c ur r e ntly industr ia l solution- Vie ws c onsist of joins of the fa c t table with a subse t of dime nsion ta ble s. But it’s a little more c omple x, • Star sc he ma : Mor e e ffe c tive for data if using the snowflake sc he ma . c ube br owsing (le ss joins). Why ? ? � ( muc h joins ) 13 14 I mplementation of the OLAP Server I mplementation of the OLAP Server ROL AP: Re la tiona l OL AP – da ta is store d in MOL AP: table s in r e lational database or e xte nde d- Multidime nsiona l OL AP - imple me nts the r e lational database s. T he y use an RDBMS to multidime nsional vie w by stor ing data in manage the war e house data and spe c ia l multidime nsiona l da ta struc ture s. aggr e gations using ofte n a star sc he ma. • T he y suppor t e xte nsions to SQL . Advantage : F ast inde xing to pr e -c ompute d Advantage : Sc alable . aggr e gations. Only value s are stor e d. Disadvantage : No dir e c t ac c e ss to c e lls. Disadvantage : Not ve r y sc alable . 15 16 Metadata requirements Metadata requirements • Busine ss me ta data • Administr ative me tadata – Busine ss te r − Sour ms a nd de finitions c e data base and the ir c onte nts – Owne r − Ba c k- e nd a nd front- e nd tools ship of data – Charging polic ie s − De finitions of the wa re house sc he ma − Pr e - de fine d que rie s and r e por ts • Ope r − Da ta ma rk loc ations a nd c onte nts ational me tadata − Data r – Data line age : histor e fr e sh and purging polic ie s y of migr a te d data and − Use r profile s a nd use r a c c e ss c ontr se que nc e of tra nsforma tions a pplie d ol polic ie s – Curre nc y of da ta : a c tive , a rc hive d, purg e d – Monitoring informa tion: wa re house usa g e statistic s, e r r or r e por ts, audit trails 17 18 3

  4. Discussion • How doe s the he te rog e ne ity in da ta wa re house s diffe r fr om the topic s tha t we ’ve disc usse d in data inte gration? What ar e some a pplic ations that you would use data inte gr a tion for? A data war e house ? Can you think of any a pplic ations for whic h “both” would be a g ood solution? • Do you think that mate r ia lize d vie ws ar e mor e impor tant or on an RDBMS? Why? Whic h one do you think is e asie r to use mate r ia lize d vie ws in? why? • Do you think that star sc he mas are more use ful in da ta war e house s than in RDMMSs? Why or why not? 19 4

Recommend


More recommend