fast and scalable relational division on fast and
play

Fast and Scalable Relational Division on Fast and Scalable - PowerPoint PPT Presentation

Fast and Scalable Relational Division on Fast and Scalable Relational Division on Database Systems Database Systems Andr S. Gonzaga , Robson L. F. Cordeiro 1. Introduction 2. Contributions 3. Background a. Genetic Data b. Diviso


  1. Fast and Scalable Relational Division on Fast and Scalable Relational Division on Database Systems Database Systems André S. Gonzaga , Robson L. F. Cordeiro

  2. 1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion

  3. INTRODUCTION Relational Division allows simple representations of queries involving the concept of “for all”

  4. INTRODUCTION 1. To select candidates having all the skills for a given job

  5. INTRODUCTION 2. To select the diseases that have all the given symptoms

  6. INTRODUCTION 2. To select the animals that have all the desired genetic conditions

  7. INTRODUCTION 1. Relational Algebra:

  8. INTRODUCTION 1. Relational Algebra: 2. RDBMS / SQL: a. Does not have an explicit operator for it. b. There are several possible implementations in SQL. c. Most of the time the relational division is used indirectly.

  9. 1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion

  10. CONTRIBUTIONS 1. Evaluate the division implementations in RDBMS in different cases of use.

  11. CONTRIBUTIONS 1. Evaluate the division implementations in RDBMS in different cases of use. 2. Investigate which aspects of the data affect the execution time of each implementation.

  12. CONTRIBUTIONS 1. Evaluate the division implementations in RDBMS in different cases of use. 2. Investigate which aspects of the data affect the execution time of each implementation. 3. Propose a new algorithm to solve the relational division queries.

  13. CONTRIBUTIONS 1. Evaluate the division implementations in RDBMS in different cases of use. 2. Investigate which aspects of the data affect the execution time of each implementation. 3. Propose a new algorithm to solve the relational division queries. 4. Perform a case study to select genetic data using the relational division.

  14. 1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion

  15. BACKGROUND | Genetic Data SNP - S ingle N ucleotide P olymorphism

  16. BACKGROUND | Genetic Data SNP - S ingle N ucleotide P olymorphism ● Variations among the individuals in genome wherein the least frequent allele has an abundance of 1% or greater

  17. BACKGROUND | Genetic Data SNP - S ingle N ucleotide P olymorphism ● Variations among the individuals in genome wherein the least frequent allele has an abundance of 1% or greater ● Some SNPs are reported to be highly related to diseases or development of specific traits of the individual.

  18. BACKGROUND | Genetic Data SNP - S ingle N ucleotide P olymorphism ● Variations among the individuals in genome wherein the least frequent allele has an abundance of 1% or greater ● Some SNPs are reported to be highly related to diseases or development of specific traits of the individual. ● Represents about 90% of all genetic variations of the individuals.

  19. BACKGROUND | Genetic Data SNP - S ingle N ucleotide P olymorphism Codified as: SNP Position along the chromosome Alleles: 11, 12 , 21, 22

  20. BACKGROUND | Genetic Data SNP - S ingle N ucleotide P olymorphism Codified as: Genetic data of the Position along the chromosome Individual Alleles: 11, 12 , 21, 22

  21. 1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion

  22. BACKGROUND | Relational Division ● It is the only, directly, algebraic correspondent to the Universal Quantification ( ∀ ) from the Relational Calculus.

  23. BACKGROUND | Relational Division ● It is the only, directly, algebraic correspondent to the Universal Quantification ( ∀ ) from the Relational Calculus. The division operation is a derived operator.

  24. BACKGROUND | Relational Division

  25. BACKGROUND | Relational Division DIVIDEND

  26. BACKGROUND | Relational Division DIVIDEND

  27. BACKGROUND | Relational Division DIVIDEND

  28. BACKGROUND | Relational Division DIVIDEND

  29. BACKGROUND | Relational Division DIVISOR

  30. BACKGROUND | Relational Division DIVISOR

  31. BACKGROUND | Relational Division DIVISOR

  32. BACKGROUND | Relational Division QUOTIENT

  33. RELATED WORK

  34. RELATED WORK

  35. 1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion

  36. PROPOSED ALGORITHMS ● We developed a new algorithm for the division operation

  37. PROPOSED ALGORITHMS | Index-Division Valid groups: {1, 2, 3}

  38. PROPOSED ALGORITHMS | Index-Division Valid groups: {1, 2, 3}

  39. PROPOSED ALGORITHMS | Index-Division Valid groups: {1, 2, 3}

  40. PROPOSED ALGORITHMS | Index-Division Valid groups: {1, 2, 3}

  41. PROPOSED ALGORITHMS | Index-Division Valid groups: {1, 2, 3}

  42. PROPOSED ALGORITHMS | Index-Division Valid groups: { 1 , 2, 3}

  43. 1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion

  44. PROPOSED ALGORITHMS | Data Generator 1. Cardinality , the number of tuples in the relations of dividend R1 and of divisor R2;

  45. PROPOSED ALGORITHMS | Data Generator 1. Cardinality, the number of tuples in the relations of dividend R1 and of divisor R2; 2. Number of individuals , the number of groups of tuples representing the individuals to be evaluated in the operation;

  46. PROPOSED ALGORITHMS | Data Generator 1. Cardinality, the number of tuples in the relations of dividend R1 and of divisor R2; 2. Number of individuals, the number of groups of tuples representing the individuals to be evaluated in the operation; 3. Correlation , the percentage of individuals, from the total, which satisfy all the requirements on R2 thus being part of the result;

  47. PROPOSED ALGORITHMS | Data Generator 1. Cardinality, the number of tuples in the relations of dividend R1 and of divisor R2; 2. Number of individuals, the number of groups of tuples representing the individuals to be evaluated in the operation; 3. Correlation, the percentage of individuals, from the total, which satisfy all the requirements on R2 thus being part of the result; 4. Variability , the differences in size between individuals, adjusting the number of tuples on each group.

  48. 1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion

  49. EXPERIMENTS 1. Synthetic data: R1 : [ 100.000, 1.000.000 ] R2: [ 1, 1.000 ] Correlation: 0% to 100% Variability: 0% to 100%

  50. EXPERIMENTS 1. Synthetic data: R1 : [ 100.000, 1.000.000 ] R2: [ 1, 1.000 ] Correlation: 0% to 100% Variability: 0% to 100% 2. Genetic Data: 4,100 Animals 10,000 SNPs > 40,000,000 tuples! http://qtl-mas-2012.kassiopeagroup.com/en/index.php

  51. EXPERIMENTS

  52. 1. Introduction 2. Contributions 3. Background a. Genetic Data b. Divisão Relacional 4. Related Work 5. Proposed Algorithms a. Index-Divison b. Division Data Generator 6. Experiments a. Synthetic Data b. Case Study 7. Conclusion

  53. CONCLUSION We consider that a possible implementation of the Index Division inside the core of the DBMS could achieve the best performance on relational division queries.

  54. Thanks for your attention!

  55. REFERENCES

Recommend


More recommend