Data Management Systems Access Methods Denormalized tables Pages - PowerPoint PPT Presentation

Data Management Systems • Access Methods Denormalized tables • Pages and Blocks • Indexing Log structured databases • Access Methods in context No indexes Gustavo Alonso Institute of Computing Platforms Department of Computer Science ETH Zürich Access Methods in context 1

Alternative designs • There are many different versions of what we have studied regarding access methods • New and old ideas keep being applied because the context changes: • Denormalized tables = store more than one table in the same blocks • It is a form of materialized view with a join already performed • Useful if tables are more often accessed together than on their own • Log Structured representation = do not store tuples but only updates to them • Main memory databases for OLTP • Cloud storage • Avoid creating indexes • Cloud databases (Snowflake) • Database Cracking (MonetDB) Access Methods in context 2

Denormalized tables Access Methods in context 3

Merging tables in advance • Normal forms are rules used to eliminate redundancy in the schema of a database • They force to split tables into differnet tables until each one represents a distinct concept • Good idea to save space but often tables will be joined in almost every query • Some systems allow to cluster tables into the same segment so that blocks contain data form both tables (indexed by the common attributes/keys) Access Methods in context 4

Clustered Tables (Oracle) • Clustered tables make sense when the tables are really processed together most of the time • Reduces I/O, saves space • Tuples that are related are stored together in the same block, reducing access time • It is like a materialized join • Updates can become expensive Access Methods in context 5

Log structured databases Access Methods in context 6

Log structured file • Instead of storing tuples and modifying them as needed, keep a record of how the data was modified (a log): • Inserts records the entire tuple • Delete records that the tuple is invalidated • Updates records the attributes that have been modified • New entries are appended at the end of the file Block INSERT id = 17, Val = 3 New entries DELETE id = 11 appended at the end UPDATE id = 25, Val = 7 UPDATE id = 98, Val = 14 Access Methods in context 7

Log structured database • It is much faster to update a file sequentially than to do random accesses (even in SSDs) • In cloud storage, file storage is typically append only • It minimizes the cost of making the data persistent (very expensive for OLTP databases with many transactions) • Not for heavy query analytics but very good for in memory transactional workloads Access Methods in context 8

Optimizing log structured representations • Periodically compact the log: • Remove the history and add one entry for every tuple • Index the tuples and their modifications (entry could contain pointer to previous entry referring to that tuple) • It actually matches what often happens in many applications. • No updates in place • Record history of operations • Periodically apply all operations • Increasingly being used in a number of databases, especially cloud databases Access Methods in context 9

No indexes Access Methods in context 10

Snowflake: Micro-partitions instead of indexes • Micro- partitions are Snowflake’s name for extents • What is interesting is how they are organized to facilitate query processing • Size ranges between 50 and 500 MB (before compression, the data is always compressed when in S3) • Each micro partition has metadata describing what is inside • The metadata can be read without reading the whole micro-partition • The metadata is used to read just the part of the micro-partition that is relevant • Data in the micro-partition is stored in columnar form (by columns not by rows) Storage in Context 11

https://docs.snowflake.com/en/user-guide/tables-clustering-micropartitions.html Storage in Context 12

Pruning based on metadata • The header for a micro-partition contains information about the data. SELECT * FROM T WHERE age > 45 • If header contains the min and max age in the data, we can decide we do not need that micro-partition simply by looking at the header • Same as indexing built in the micropartitions Storage in Context 13

Database Cracking • Pioneered in MonetDB (open source column store) • Do not create an index, instead, build the index incrementally while the data is being processed • Based on creating a copy of the column • Do it in successive passes so that the cost is not too high • Initial queries expensive, later cost is amortized as work has already been done Access Methods in context 14

Data Management Systems Access Methods Denormalized tables Pages - PowerPoint PPT Presentation

Data Management Systems Access Methods Denormalized tables Pages and Blocks Indexing Log structured databases Access Methods in context No indexes Gustavo Alonso Institute of Computing Platforms Department of Computer Science

Systems Systems Systems Integration Systems Integration Systems Systems Integration Systems

Types of Expert Systems Interpretation Systems Prediction Systems Diagnosis Systems

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

CMPUT 391 Database Management Systems Spatial Data Management 1 Dr. Jrg Sander, 2006

BSD Data Systems Report to the School Board, April 2019 Data-Driven ESSA Best Practices Data

Data Collection and Data Management saverio . giallorenzo @gmail.com 1 Web Science Data

PRESENTS PRESENTS TRUE HOTEL MANAGEMENT SYSTEM TRUE HOTEL MANAGEMENT SYSTEM MANAGEMENT FEATURE

PUBLIC FINANCE MANAGEMENT SYSTEMS MANAGEMENT SYSTEMS AND RISK MODELLING IN TURKISH TREASURY

Database Management Objectives of Lecture 10 Systems Spatial Data Management Spatial Data

Bracing Systems Bracing Systems 1 1 Rod Bracing Rod Bracing 2 2 Wind Bracing Systems Wind

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Database Management Objectives of Lecture 5 Systems Data Warehousing and OLAP Data Warehousing

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S.

Diagnose data for cleaning Cleaning Data in Python Cleaning data Prepare data for analysis

CS378 Introduction to Data Mining Data Exploration and Data Preprocessing Li Xiong Data

Data Preparation Discretization Data cleaning (Data pre-processing) Data

Camera Calibration (Compute Camera Matrix P) Shao-Yi Chien Department of Electrical

Redshift & Final Project Monday, March 20, 2017 Agenda Announcements Redshift Primer

John Sumsion FamilySearch Cassandra for online systems Introduction to Family Tree

Database Tuning Module 5, Lectures 6 and 7 Database Management Systems, R. Ramakrishnan 1

Open Source is Broken Don Goodman-Wilson ka tsu don @DEGoodmanWilson .tech @DEGoodmanWilson

Counting Contingency Tables Igor Pak, UCLA Combinatorics Seminar, OSU, September 17, 2020 1

Database Optimization say about 97% of the time: premature optimization is the root of all

for Microservices Mauro Servienti | @mauroservienti @mauroservienti | #apiconf2018 100% remote

Data Management Systems Access Methods Denormalized tables Pages - PowerPoint PPT Presentation

Data Management Systems Access Methods Denormalized tables Pages and Blocks Indexing Log structured databases Access Methods in context No indexes Gustavo Alonso Institute of Computing Platforms Department of Computer Science

Systems Systems Systems Integration Systems Integration Systems Systems Integration Systems

Types of Expert Systems Interpretation Systems Prediction Systems Diagnosis Systems

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

CMPUT 391 Database Management Systems Spatial Data Management 1 Dr. Jrg Sander, 2006

BSD Data Systems Report to the School Board, April 2019 Data-Driven ESSA Best Practices Data

Data Collection and Data Management saverio . giallorenzo @gmail.com 1 Web Science Data

PRESENTS PRESENTS TRUE HOTEL MANAGEMENT SYSTEM TRUE HOTEL MANAGEMENT SYSTEM MANAGEMENT FEATURE

PUBLIC FINANCE MANAGEMENT SYSTEMS MANAGEMENT SYSTEMS AND RISK MODELLING IN TURKISH TREASURY

Database Management Objectives of Lecture 10 Systems Spatial Data Management Spatial Data

Bracing Systems Bracing Systems 1 1 Rod Bracing Rod Bracing 2 2 Wind Bracing Systems Wind

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Database Management Objectives of Lecture 5 Systems Data Warehousing and OLAP Data Warehousing

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S.

Diagnose data for cleaning Cleaning Data in Python Cleaning data Prepare data for analysis

CS378 Introduction to Data Mining Data Exploration and Data Preprocessing Li Xiong Data

Data Preparation Discretization Data cleaning (Data pre-processing) Data

Camera Calibration (Compute Camera Matrix P) Shao-Yi Chien Department of Electrical

Redshift &amp; Final Project Monday, March 20, 2017 Agenda Announcements Redshift Primer

John Sumsion FamilySearch Cassandra for online systems Introduction to Family Tree

Database Tuning Module 5, Lectures 6 and 7 Database Management Systems, R. Ramakrishnan 1

Open Source is Broken Don Goodman-Wilson ka tsu don @DEGoodmanWilson .tech @DEGoodmanWilson

Counting Contingency Tables Igor Pak, UCLA Combinatorics Seminar, OSU, September 17, 2020 1

Database Optimization say about 97% of the time: premature optimization is the root of all

for Microservices Mauro Servienti | @mauroservienti @mauroservienti | #apiconf2018 100% remote

Redshift & Final Project Monday, March 20, 2017 Agenda Announcements Redshift Primer