physical database design
play

Physical Database Design Ramakrishnan & Gehrke, Chapter 17 & - PowerPoint PPT Presentation

Physical Database Design Ramakrishnan & Gehrke, Chapter 17 & 18 340151 Big Data & Cloud Services (P. Baumann) Alternative File Organizations File organization = Method of arranging a file of records on external storage Goal:


  1. Physical Database Design Ramakrishnan & Gehrke, Chapter 17 & 18 340151 Big Data & Cloud Services (P. Baumann)

  2. Alternative File Organizations  File organization = Method of arranging a file of records on external storage • Goal: quickly find records needed by query  Several alternatives • Heap files • Sorted Files • Indexes 340151 Big Databases & Cloud Services (P. Baumann) 2

  3. Index Selection Guidelines  Understand workload: • Queries vs. update • What relations (sizes!), attributes, conditions, joins (selectivity!), …?  Attributes in WHERE clause are candidates for index keys • Exact match condition suggests hash index, range query suggests tree index • Consider multi-attribute search keys for several WHERE clause conditions • Order of attributes important for range queries  Choose indexes that benefit as many queries as possible • impact on updates: Indexes make queries faster, updates slower • require disk space  understand how DBMS evaluates queries & creates query evaluation plans 340151 Big Databases & Cloud Services (P. Baumann) 3

  4. Decisions to Make  What indexes? • Which relations? What field(s) search key? Several indexes? • For each index, what kind of an index should it be?  Change conceptual schema? guided by workload, in addition to redundancy issues • Consider alternative normalized schemas? (many choices!) • “undo’’ some decompositions, settle for a lower normal form, such as 3NF? (denormalization) • Horizontal partitioning, replication, views ...see manuals  If made after a database is in use, called schema evolution 340151 Big Databases & Cloud Services (P. Baumann) 4

  5. Example Schema Contracts (Cid, Sid, Jid, Did, Pid, Qty, Val) Depts (Did, Budget, Report) Suppliers (Sid, Address) Parts (Pid, Cost) Projects (Jid, Mgr)  Contracts = CSJDPQV; ICs: JP C, SD P; C is primary key • superkey? • What normal form? 340151 Big Databases & Cloud Services (P. Baumann) 5

  6. Denormalization Contracts (Cid, Sid, Jid, Did, Pid, Qty, Val)  Suppose following query is important: • “Value of contract less than department budget?”  To speed up, add field budget B (from Departments) to Contracts • New FD for Dept./Budget: Did B • Contracts no longer in 3NF  might choose to modify Contracts • sufficiently important? No good performance otherwise? • i.e., by indexes, choosing alternative 3NF schema 340151 Big Databases & Cloud Services (P. Baumann) 6

  7. Decomposition of a BCNF Relation  Suppose { SDP, CSJDQV } in BCNF • no reason to decompose further (assuming that all known ICs are FDs)  However, suppose that these queries are important • “Find the contracts held by supplier S” • “Find the contracts that department D is involved in”  Decomposing CSJDQV further into CS, CD and CJQV: • could speed up these queries (Why?) • following query is slower: “Find total value of all contracts held by supplier S.” 340151 Big Databases & Cloud Services (P. Baumann) 7

  8. Masking Conceptual Schema Changes CREATE VIEW Contracts(cid, sid, jid, did, pid, qty, val) AS SELECT * FROM LargeContracts UNION SELECT * FROM SmallContracts  Contracts  LargeContracts + SmallContracts can be masked by view  queries with condition val>10000: preferable LargeContracts for efficient execution 340151 Big Databases & Cloud Services (P. Baumann) 8

  9. Tuning Queries and Views  query runs slower than expected? check if index needs to be re-built or statistics too old  DBMS may not be executing plan you had in mind. Common problems: • Selections involving null values • Selections involving arithmetic or string expressions • Selections involving OR conditions • Lack of evaluation features like index-only strategies or certain join methods or poor size estimation  Check plan used, adjust choice of indexes or rewrite query/view • Avoid nested queries, temporary relations, complex conditions, and operations like DISTINCT and GROUP BY 340151 Big Databases & Cloud Services (P. Baumann) 9

  10. PS: A Moderately Complex Query 340151 Big Databases & Cloud Services (P. Baumann) 10

  11. Key Performance Factors  Ref: discussion "what are the key points to improve the query performance" on the LinkedIn Database list, 2012-07-20 340151 Big Databases & Cloud Services (P. Baumann) 11

  12. Summary  Many alternative file organizations, each appropriate in some situation  If selection queries frequent: sort file or build an index • Hash vs tree indexes vs sorted files  Understand workload & DBMS query plans 340151 Big Databases & Cloud Services (P. Baumann) 12

Recommend


More recommend