Fast and Scalable Relational Division on Fast and Scalable Relational Division on Database Systems Database Systems André S. Gonzaga , Robson L. F. Cordeiro
1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion
INTRODUCTION Relational Division allows simple representations of queries involving the concept of “for all”
INTRODUCTION 1. To select candidates having all the skills for a given job
INTRODUCTION 2. To select the diseases that have all the given symptoms
INTRODUCTION 2. To select the animals that have all the desired genetic conditions
INTRODUCTION 1. Relational Algebra:
INTRODUCTION 1. Relational Algebra: 2. RDBMS / SQL: a. Does not have an explicit operator for it. b. There are several possible implementations in SQL. c. Most of the time the relational division is used indirectly.
1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion
CONTRIBUTIONS 1. Evaluate the division implementations in RDBMS in different cases of use.
CONTRIBUTIONS 1. Evaluate the division implementations in RDBMS in different cases of use. 2. Investigate which aspects of the data affect the execution time of each implementation.
CONTRIBUTIONS 1. Evaluate the division implementations in RDBMS in different cases of use. 2. Investigate which aspects of the data affect the execution time of each implementation. 3. Propose a new algorithm to solve the relational division queries.
CONTRIBUTIONS 1. Evaluate the division implementations in RDBMS in different cases of use. 2. Investigate which aspects of the data affect the execution time of each implementation. 3. Propose a new algorithm to solve the relational division queries. 4. Perform a case study to select genetic data using the relational division.
1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion
BACKGROUND | Genetic Data SNP - S ingle N ucleotide P olymorphism
BACKGROUND | Genetic Data SNP - S ingle N ucleotide P olymorphism ● Variations among the individuals in genome wherein the least frequent allele has an abundance of 1% or greater
BACKGROUND | Genetic Data SNP - S ingle N ucleotide P olymorphism ● Variations among the individuals in genome wherein the least frequent allele has an abundance of 1% or greater ● Some SNPs are reported to be highly related to diseases or development of specific traits of the individual.
BACKGROUND | Genetic Data SNP - S ingle N ucleotide P olymorphism ● Variations among the individuals in genome wherein the least frequent allele has an abundance of 1% or greater ● Some SNPs are reported to be highly related to diseases or development of specific traits of the individual. ● Represents about 90% of all genetic variations of the individuals.
BACKGROUND | Genetic Data SNP - S ingle N ucleotide P olymorphism Codified as: SNP Position along the chromosome Alleles: 11, 12 , 21, 22
BACKGROUND | Genetic Data SNP - S ingle N ucleotide P olymorphism Codified as: Genetic data of the Position along the chromosome Individual Alleles: 11, 12 , 21, 22
1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion
BACKGROUND | Relational Division ● It is the only, directly, algebraic correspondent to the Universal Quantification ( ∀ ) from the Relational Calculus.
BACKGROUND | Relational Division ● It is the only, directly, algebraic correspondent to the Universal Quantification ( ∀ ) from the Relational Calculus. The division operation is a derived operator.
BACKGROUND | Relational Division
BACKGROUND | Relational Division DIVIDEND
BACKGROUND | Relational Division DIVIDEND
BACKGROUND | Relational Division DIVIDEND
BACKGROUND | Relational Division DIVIDEND
BACKGROUND | Relational Division DIVISOR
BACKGROUND | Relational Division DIVISOR
BACKGROUND | Relational Division DIVISOR
BACKGROUND | Relational Division QUOTIENT
RELATED WORK
RELATED WORK
1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion
PROPOSED ALGORITHMS ● We developed a new algorithm for the division operation
PROPOSED ALGORITHMS | Index-Division Valid groups: {1, 2, 3}
PROPOSED ALGORITHMS | Index-Division Valid groups: {1, 2, 3}
PROPOSED ALGORITHMS | Index-Division Valid groups: {1, 2, 3}
PROPOSED ALGORITHMS | Index-Division Valid groups: {1, 2, 3}
PROPOSED ALGORITHMS | Index-Division Valid groups: {1, 2, 3}
PROPOSED ALGORITHMS | Index-Division Valid groups: { 1 , 2, 3}
1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion
PROPOSED ALGORITHMS | Data Generator 1. Cardinality , the number of tuples in the relations of dividend R1 and of divisor R2;
PROPOSED ALGORITHMS | Data Generator 1. Cardinality, the number of tuples in the relations of dividend R1 and of divisor R2; 2. Number of individuals , the number of groups of tuples representing the individuals to be evaluated in the operation;
PROPOSED ALGORITHMS | Data Generator 1. Cardinality, the number of tuples in the relations of dividend R1 and of divisor R2; 2. Number of individuals, the number of groups of tuples representing the individuals to be evaluated in the operation; 3. Correlation , the percentage of individuals, from the total, which satisfy all the requirements on R2 thus being part of the result;
PROPOSED ALGORITHMS | Data Generator 1. Cardinality, the number of tuples in the relations of dividend R1 and of divisor R2; 2. Number of individuals, the number of groups of tuples representing the individuals to be evaluated in the operation; 3. Correlation, the percentage of individuals, from the total, which satisfy all the requirements on R2 thus being part of the result; 4. Variability , the differences in size between individuals, adjusting the number of tuples on each group.
1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion
EXPERIMENTS 1. Synthetic data: R1 : [ 100.000, 1.000.000 ] R2: [ 1, 1.000 ] Correlation: 0% to 100% Variability: 0% to 100%
EXPERIMENTS 1. Synthetic data: R1 : [ 100.000, 1.000.000 ] R2: [ 1, 1.000 ] Correlation: 0% to 100% Variability: 0% to 100% 2. Genetic Data: 4,100 Animals 10,000 SNPs > 40,000,000 tuples! http://qtl-mas-2012.kassiopeagroup.com/en/index.php
EXPERIMENTS
1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion
CONCLUSION We consider that a possible implementation of the Index Division inside the core of the DBMS could achieve the best performance on relational division queries.
Thanks for your attention!
REFERENCES
Recommend
More recommend