CS377: Database Systems Distributed Databases Distributed Databases �������� Department of Mathematics and Computer Science Emory University 1
Centralized DBMS on a Network ������ ������ ������ ������ ������������� ������� ������ ������ 2
Distributed DBMS Environment ������ ������ ������ ������ ������������� ������� ������ ������ 3
Distributed Database System � A distributed database (DDB) is a collection of multiple, ���������������������� databases distributed over a ����������������� � A distributed database management system (D– DBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users. � Distributed database system (DDBS) = DDB + D– DBMS 4
Distributed Database System The EMPLOYEE, PROJECT, and WORKS_ON tables may be fragmented horizontally and stored with possible replication as shown below. 6
Distributed DBMS Promises � Transparent management of distributed, fragmented, and replicated data � Improved reliability/availability through distributed transactions � Improved performance � Easier and more economical system expansion 7
Distributed DBMS Issues � ��������������������������� � How to distribute the database � ���������������� � Optimize cost = data transmission + local processing 8
Distributed DBMS Issues � ������������������� � Synchronization of concurrent accesses � Consistency and isolation of transactions' effects � Deadlock management � ����������� � How to make the system resilient to failures � Atomicity and durability 9
Distributed database design � Data distribution � TopAdown A mostly in designing systems from scratch � BottomAup A when the databases already exist at a number of sites � Unit of distribution � relation � fragments of relations (subArelations) � Data are inherently fragmented, e.g. in locality � Allow concurrent execution of a number of transactions that access different portions of a relation 10
Example Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select * from E from E where loc=Sa where loc=Sa where loc=Sb where loc=Sb and… and ... Motivation: Two sites: Sa, Sb Qa → ← Qb �� �� 11
Fragmentation Alternatives – Horizontal ���� PROJ 1 : projects with budgets ��� ����� ������ ��� less than $200,000 �� ��������������� � !!!! �������" �% ����&������'�"�() �# !!! �������� �#� ���$��� % !!!! �������� PROJ 2 : projects with budgets �* ���������+� #�!!!! ����� � � ���$��� ���$��� !!!!! !!!!! ������ ������ greater than or equal to greater than or equal to $200,000 ���� � ���� % ��� ��� ��� ����� ������ ��� ����� ������ �#� ���$��� % !!!! �������� �� ��������������� � !!!! �������" �* ���������+� #�!!!! ����� �% ����&������'�"�() �# !!! �������� � ���$��� !!!!! ������ 12
Fragmentation Alternatives – Vertical ���� PROJ 1 :information about ��� ����� ������ ��� project budgets �� ��������������� � !!!! �������" �% ����&������'�"�() �# !!! �������� �#� ���$��� % !!!! �������� PROJ 2 :information about �* ���������+� #�!!!! ����� � � ���$��� ���$��� !!!!! !!!!! ������ ������ project names and project names and locations ���� � ���� % ��� ����� ��� ��� ������ �� ��������������� �������" �� � !!!! �% ����&������'�"�() �������� �% �# !!! �#� ���$��� �������� �#� % !!!! �* ���������+� ����� �* #�!!!! � ���$��� ������ � !!!!! 13
Data Fragmentation, Replication and Allocation � ������������������������ � A horizontal subset of a relation which contain those of tuples which satisfy selection conditions. � E.g. Employee relation with selection condition (DNO = 5) � Can be specified by a σ � Can be specified by a σ σ σ Ci (R) operation in the relational algebra. σ σ Ci (R) operation in the relational algebra. σ σ � Complete horizontal fragmentation � A set of horizontal fragments whose conditions C1, C2, …, Cn include all the tuples in RA every tuple in R satisfies (C1 OR C2 OR … OR Cn). � Disjoint complete horizontal fragmentation: No tuple in R satisfies (Ci AND Cj) where i ≠ j. � How to reconstruct R from complete horizontal fragments? 14
Three common horizontal partitioning techniques � Round robin � Hash partitioning � Range partitioning 15 15
• Round robin R D 0 D 1 D 2 t1 t1 t2 t2 t3 t3 t4 t4 t4 t4 ... t5 16
• Hash partitioning R D 0 D 1 D 2 t1 → h(k 1 )=2 t1 t2 → h(k 2 )=0 t2 t3 → h(k 3 )=0 t3 → h(k 3 )=0 t3 t3 t4 → h(k 4 )=1 t4 ... 17
• Range partitioning R D 0 D 1 D 2 t1: A=5 t1 ������������ ������ t2: A=8 t2 � � t3: A=2 t3 t4: A=3 t4: A=3 t4 t4 ������ ������ ... 18
Data Fragmentation, Replication and Allocation � ���������������������� � A vertical subset of a relation that contains a subset of columns. � E.g. Employee relation: a vertical fragment of Name, Bdate, Sex � Can be specified by a Π Li (R) operation in the relational algebra. � Can be specified by a Π Li (R) operation in the relational algebra. � Each fragment must include the primary key attribute of the parent relation Employee � Complete vertical fragmentation � A set of vertical fragments whose projection lists L1, L2, …, Ln include all the attributes in R but share only the primary key of R. � L1 ∪ L2 ∪ ... ∪ Ln = ATTRS (R) � Li ∩ Lj = PK(R) for any i j � How to reconstruct R from complete vertical fragments? 19
Data Fragmentation, Replication and Allocation � ������������� �������������� � A combination of Vertical fragmentation and Horizontal fragmentation. � This is achieved by SELECTAPROJECT operations which is represented by Π Li ( σ which is represented by Π Li ( σ σ σ σ Ci (R)) σ σ σ Ci (R)) 20
Data Fragmentation, Replication and Allocation � !���������������"��� � A definition of a set of fragments (horizontal or vertical or mixed) that can reconstruct the original database � #������������"��� � Distribution of fragments to sites of distributed databases. It � Distribution of fragments to sites of distributed databases. It can be fully or partially replicated or can be partitioned � �������$�������� � Full replication: database is replicated to all sites. � Partial replication: some selected part is replicated 21
Distributed Database System The EMPLOYEE, PROJECT, and WORKS_ON tables may be fragmented horizontally and stored with possible replication as shown below. 22
Distributed DBMS Issues � ��������������������������� � How to distribute the database � ���������������� � Optimize cost = data transmission + local processing 23
Recommend
More recommend