horizontal or vertical storage
play

Horizontal or Vertical Storage A fact table for data warehousing is - PowerPoint PPT Presentation

Horizontal or Vertical Storage A fact table for data warehousing is often fat Tens of even hundreds of dimensions/attributes A query is often about only a few attributes Horizontal storage: tuples are stored one by one Vertical


  1. Horizontal or Vertical Storage • A fact table for data warehousing is often fat – Tens of even hundreds of dimensions/attributes • A query is often about only a few attributes • Horizontal storage: tuples are stored one by one • Vertical storage: tuples are stored by attributes A 1 A 2 … A 100 A 1 A 2 … A 100 x 1 x 2 … x 100 x 1 x 2 … x 100 … … … … … … … … z 1 z 2 … z 100 z 1 z 2 … z 100 Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 1

  2. Horizontal Versus Vertical • Find the information of tuple t – Typical in OLTP – Horizontal storage: get the whole tuple in one search – Vertical storage: search 100 lists • Find SUM(a 100 ) GROUP BY {a 22 , a 83 } – Typical in OLAP – Horizontal storage (no index): search all tuples O(100n), where n is the number of tuples – Vertical storage: search 3 lists O(3n), 3% of the horizontal storage method • Projection index: vertical storage Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 2

  3. Only Vertical Storage Is Not Enough Performance'of'Column3Oriented'Op$miza$ons' 45" –Late" Materializa:on" 40" 35" –Compression" Run$me'(sec)' 30" –Join"Op:miza:on" 25" 20" –Tuple@at@a@:me" 15" 10" Baseline" 5" 0" Column"Store" Row"Store" Abadi et al. The Design and Implementation of Modern Column-Oriented Database Systems. Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 3

  4. C-Store • ROS: Read Optimized Store (column oriented) • WOS: Write Optimized Store (row oriented, mainly for input) • Each column may be stored multiple times in different sort orders • Projection: a group of columns sorted on the same attribute Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 4

  5. Different Projections (prodid,date,region | region,date) (saleid,date,region | date) saleid date region prodid date region 1/6/08 East 17 1/6/08 West 5 1 1 22 1/6/08 East 9 2/5/08 East 2 2 2/12/08 East 6 1/8/08 South 4 3 3 98 1/13/08 South 12 1/20/08 North 4 4 2/4/08 North 12 1/20/08 North 5 5 5 4 1/24/08 South 7 1/8/08 South 6 6 1/13/08 South 7 14 2/2/08 West 7 22 1/24/08 South 8 7 2/4/08 North 8 3 1/6/08 West 9 8 2/5/08 East 9 18 2/2/08 West 10 10 11 2/12/08 East 6 (a) Sales Projection Sorted By Date (b) Sales Projection Sorted By Region, Date Abadi et al. The Design and Implementation of Modern Column-Oriented Database Systems. Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 5

  6. MonetDB SQL XQuery SPARQL front − end front − end front − end BAT Algebra BAT: Binary Association Table BAT ’byear’ BAT ’name’ 1907 0 0 John Wayne\0 0 1927 1 11 Roger Moore\0 1 1 0 Bob Fosse\0 1927 23 2 2 2 1 Will Smith\0 1968 3 33 3 select(byear,1927) (memory mapped) simple memory array (virtual) dense surrogates MonetDB back − end Abadi et al. The Design and Implementation of Modern Column-Oriented Database Systems. Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 6

  7. Vectorized Processing • Query: SELECT AVG(A) FROM R WHERE A < 100 • Full materialization – Scan column A once, find the qualified rows – Compute average on the qualified rows – Drawback: large intermediate results • Vectorized processing – Get N tuples in a shot – Compute the sum and count of the N tuples – N is set so that the vector can fit into the L1 cache Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 7

  8. Why Vectorized Processing? • Reduced interpretation cost • Better cache locality • Compiler optimization – SIMD instructions • Block algorithms • Parallel memory access • Profiling and adaptive execution Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 8

  9. Late Materialization • Keep data in columns until late in query evaluation – Use intermediate position lists Initial Status Query and Query Plan (MAL Algebra) Relation R Relation S select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40 Rb Rc Sa Sb Ra 1. inter1 = select( Ra,5,20 ) 3 12 12 17 11 2. inter2 = reconstruct( Rb,inter1 ) 16 34 34 49 35 3. inter3 = select( inter2,30,40 ) 56 75 53 58 62 4. join_input_R = reconstruct( Rc,inter3 ) 9 45 23 99 44 5. inter4 = select( Sa,55,65 ) 11 49 78 64 29 6. inter5 = reconstruct( Sb,inter4 ) 27 58 65 37 78 7. join_input_S = reverse( inter5 ) 8 97 33 53 19 8. join_res_R_S = join( join_input_R,join_input_S ) 41 75 21 61 81 9. inter6 = voidTail( join_res_R_S ) 19 42 29 32 26 10. inter7 = reconstruct( Ra,inter6 ) 35 55 0 50 23 11. result = sum (inter7) Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 9

  10. Example select(Ra,5,20) reconstruct(Rb,inter1) Ra Rb inter2 inter1 inter1 2 2 3 12 2 34 4 4 16 34 4 45 5 5 56 75 5 49 7 7 9 45 7 97 9 9 11 49 9 42 27 58 8 97 41 75 19 42 35 55 (1) (2) Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 10

  11. Example select(inter2,30,40) reconstruct(Rc,inter3) inter2 inter3 inter3 Rc join_input_R 2 34 4 4 4 23 12 4 45 5 5 5 78 34 5 49 9 9 9 29 53 7 97 23 9 42 78 65 33 21 29 0 (2) (3) (4) Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 11

  12. Example select(Sa,55,65) reverse(inter5) reconstruct(Sb,inter4) Sa Sb inter5 inter5 join_input_S inter4 inter4 3 3 17 3 62 3 62 3 62 11 5 5 49 5 29 5 29 29 5 35 7 7 58 7 19 7 19 19 7 62 8 8 99 8 81 8 81 81 8 44 10 10 64 10 23 10 23 10 23 29 37 78 53 19 61 81 32 26 50 23 (5) (6) (7) Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 12

  13. Example join(join_input_R,join_input_S) voidTail(join_res_R_S) join_input_R join_input_S join_res_ R_S inter6 join_res_ R_S 3 62 4 10 23 4 4 4 10 29 5 5 78 9 5 9 5 9 7 19 9 29 81 8 10 23 (8) (9) Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 13

  14. Example reconstruct(Ra,inter6) sum(inter7) inter7 inter6 Ra inter7 result 9 9 4 28 3 19 19 9 16 56 9 11 27 8 41 19 35 (10) (11) Figure 4.1: An example of a select-project- Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 14

  15. Example of Joins 42 38 1 2 36 42 2 4 = 42 1 46 3 2 44 36 5 1 38 Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 15

  16. DATA CUBE Model Year Color Sales Chevy 1990 blue 62 Chevy 1990 red 5 CUBE Chevy 1990 white 95 Chevy 1990 ALL 154 Chevy 1991 blue 49 Chevy 1991 red 54 Chevy 1991 white 95 Chevy 1991 ALL 198 Chevy 1992 blue 71 Chevy 1992 red 31 Chevy 1992 white 54 Chevy 1992 ALL 156 Chevy ALL blue 182 SALES Chevy ALL red 90 Chevy ALL white 236 Model Year Color Sales Chevy ALL ALL 508 CUBE Chevy 1990 red 5 Ford 1990 blue 63 Chevy 1990 white 87 Ford 1990 red 64 Chevy 1990 blue 62 Ford 1990 white 62 Chevy 1991 red 54 Ford 1990 ALL 189 Ford 1991 blue 55 Chevy 1991 white 95 Ford 1991 red 52 Chevy 1991 blue 49 Ford 1991 white 9 Chevy 1992 red 31 Ford 1991 ALL 116 Chevy 1992 white 54 Ford 1992 blue 39 Chevy 1992 blue 71 Ford 1992 red 27 Ford 1990 red 64 Ford 1992 white 62 Ford 1992 ALL 128 Ford 1990 white 62 Ford ALL blue 157 Ford 1990 blue 63 Ford ALL red 143 Ford 1991 red 52 Ford ALL white 133 Ford 1991 white 9 Ford ALL ALL 433 Ford 1991 blue 55 ALL 1990 blue 125 ALL 1990 red 69 Ford 1992 red 27 ALL 1990 white 149 Ford 1992 white 62 ALL 1990 ALL 343 Ford 1992 blue 39 ALL 1991 blue 106 SELECT Model, Year, Color, SUM(sales) AS Sales ALL 1991 red 104 FROM Sales ALL 1991 white 110 ALL 1991 ALL 314 ALL 1992 blue 110 WHERE Model in {'Ford', 'Chevy'} ALL 1992 red 58 AND Year BETWEEN 1990 AND 1992 ALL 1992 white 116 ALL 1992 ALL 284 GROUP BY CUBE(Model, Year, Color); ALL ALL blue 339 ALL ALL red 233 ALL ALL white 369 ALL ALL ALL 941 Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 16

  17. MOLAP Date 2Qtr 1Qtr 3Qtr 4Qtr sum TV U.S.A PC VCR Country sum Canada Mexico sum Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 17

  18. Pros and Cons • Easy to implement • Fast retrieval • Many entries may be empty if data is sparse • Costly in space Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 18

  19. ROLAP – Data Cube in Table • A multi-dimensional database Base table Dimensions Measure Dimensions Measure Store Product Season Sales Store Product Season AVG(Sales) S1 P1 Spring 6 S1 P1 Spring 6 S1 P2 Spring 12 S1 P2 Spring 12 S2 P1 Fall 9 S2 P1 Fall 9 S1 * Spring 9 … … … … * * * 9 Cubing Jian Pei: CMPT 741/459 Data Warehousing and OLAP (3) 19

Recommend


More recommend