modelling and designing a database
play

Modelling and Designing a Database Bela Tiwari btiwari@ceh.ac.uk - PowerPoint PPT Presentation

Modelling and Designing a Database Bela Tiwari btiwari@ceh.ac.uk Modelling and designing a database A well thought out database will consist of a co-ordinated set of tables that satisfy a set of requirements specified by the intended users .


  1. Modelling and Designing a Database Bela Tiwari btiwari@ceh.ac.uk

  2. Modelling and designing a database A well thought out database will consist of a co-ordinated set of tables that satisfy a set of requirements specified by the intended users .

  3. Modelling and designing a database • Define clearly the purpose of the database • Define the data requirements • Prepare a data model • Translate the data model into a database design • Implement the design

  4. Modelling and designing a database • Define the purpose of the database • Define the data requirements • Prepare a data model • Translate the data model into a database design • Implement the design

  5. Modelling and designing a database • Define clearly the purpose of the database • Define the data requirements • Prepare a data model • Translate the data model into a database design • Implement the design COMPUTER

  6. What’s wrong with a spreadsheet? • Redundant data • Inconsistent data • Quality control • Control over effects of changing the data • Inflexible • Sharing challenges

  7. My made-up data table Database Sequence Type Name ID Organism Name Phosphorylase Swissprot Protein KPB1_Rabit Rabbit B kinase alpha regulatory chain Troponin T Trt3_rabit UniProt Protein Glycogen Swissprot Proten PHS2_RABIT rabbit phosphorylase Troponin I Swissprot protein TRIC_RABIT rabbit muscle OCPHOS2 rabbit EMBL Nucleotide phosphorylase mrna dbEST Nucleotide CK829726 Rabbit pol protein Q8MJF7 Rabbit TrEMBL protein Rabbit EMBL Nucleotide OXPKA Rabbit phosphorylase Glycogen 1ABB rabit PDB protein_structure Phosphorylase

  8. Redundant data Database Sequence Type Name ID Organism Name Phosphorylase Swissprot Protein KPB1_Rabit Rabbit B kinase alpha regulatory chain Troponin T Trt3_rabit UniProt Protein Glycogen Swissprot Peptide PHS2_RABIT rabbit phosphorylase Troponin I Swissprot protein TRIC_RABIT rabbit muscle OCPHOS2 rabbit EMBL Nucleotide phosphorylase mrna dbEST Nucleotide CK829726 Rabbit pol protein Q8MJF7 Rabbit TrEMBL protein Rabbit EMBL Nucleotide OXPKA Rabbit phosphorylase Glycogen 1ABB rabit PDB protein_structure Phosphorylase

  9. Inconsistent data Database Sequence Type Name ID Organism Name Phosphorylase Swissprot Protein KPB1_Rabit Rabbit B kinase alpha regulatory chain Troponin T Trt3_rabit UniProt Protein Glycogen Swissprot Peptide PHS2_RABIT rabbit phosphorylase Troponin I Swissprot protein TRIC_RABIT rabbit muscle OCPHOS2 rabbit EMBL Nucleotide phosphorylase mrna dbEST Nucleotide CK829726 Rabbit pol protein Q8MJF7 Rabbit TrEMBL protein Rabbit EMBL Nucleotide OXPKA Rabbit phosphorylase Glycogen 1ABB rabit PDB protein_structure Phosphorylase

  10. Quality control Database Sequence Type Name ID Organism Name Phosphorylase Swissprot Protein KPB1_Rabit Rabbit B kinase alpha regulatory chain Troponin T Trt3_rabit UniProt Protein Glycogen Swissprot Peptide PHS2_RABIT rabbit phosphorylase Troponin I Swissprot protein TRIC_RABIT rabbit muscle OCPHOS2 rabbit EMBL Nucleotide phosphorylase mrna dbEST Nucleotide CK829726 Rabbit pol protein Q8MJF7 Rabbit TrEMBL protein Rabbit EMBL Nucleotide OXPKA Rabbit phosphorylase Glycogen 1ABB rabit PDB protein_structure Phosphorylase

  11. Control over the effects of changing data Database Sequence Type Name ID Organism Name Phosphorylase Swissprot Protein KPB1_Rabit Rabbit B kinase alpha regulatory chain Troponin T Trt3_rabit UniProt Protein Glycogen Swissprot Peptide PHS2_RABIT rabbit phosphorylase Troponin I Swissprot protein TRIC_RABIT rabbit muscle OCPHOS2 rabbit EMBL Nucleotide phosphorylase mrna dbEST Nucleotide CK829726 Rabbit pol protein Q8MJF7 Rabbit TrEMBL protein Rabbit EMBL Nucleotide OXPKA Rabbit phosphorylase Glycogen 1ABB rabit PDB protein_structure Phosphorylase

  12. Modelling and designing a database establish requirements Data Requirements data analysis Conceptual Data Model database design Logical Schema Specification implement schema and database

  13. Modelling and designing a database establish requirements Data Requirements data analysis Conceptual Data Model database design Logical Schema Specification implement PEN and PAPER schema and database

  14. A systematic approach will lead to a database that meets your needs! poor design == tears later

  15. Establishing requirements Consult with the users of your system and agree upon • what the users need the database to do • what data needs to be stored to accomplish this Include in your considerations the following: • Data that can be derived should not be stored e.g. if you are storing the birth year of a person, you should not also store their age; it can easily be derived • The type of questions people will ask, and whether new types of data are likely to be added to the database in future influence how you choose to organise your data. In the end, you want your database to be useful for what it needs to do!

  16. Define the data requirements What persistent data is to be stored in the database? Write a description of what is required of the database, including the kinds of data that should be stored and what questions users would like to be able to ask. Information about the limits on the data and any issues like whether a certain type of data is required or not should also be included.

  17. Define the data requirements Example: Initial Statement of Database Requirements A database is required that will hold information about sequence entries from public databases of interest along with the type of features in that sequence. The users should be able to search for a database accession number, organism, or public database name and retrieve a listing of all relevant sequences. They should also be able to retrieve any sequences containing particular feature types. For each public database sequence listed, the users would like to see the URL for that database. Questions: • Does a sequence from a particular public database have to be stored in order for an entry for that public database to be included? • Do they want to store the sequence data? (implications for size and keeping up to date – can you set up an automated retrieval system from a local or remote copy of the public database?) • Do the users want to search for just organism names, or do they wish to search for strains as well? • Do they wish to store only formal names, or common names also? • Is there additional information the users would like to retrieve about the organisms, databases, features, etc that are not stated in this initial description?

  18. Does a sequence from a particular public database have to be stored in order for an entry for that public database to be added? No Do they want to store the sequence data? No – automated retrieval preferred Do the users want to search for just organism names, or do they wish to search for strains as well? Strains as well Do they wish to store only formal names, or common names also? Common names as well Is there additional information the users would like to retrieve about the organisms, databases, features, etc that are not stated in this initial description? Yes – they would like to be able to retrieve a list of public database names stored, along with the type of data that is stored in that database. They would also like to see if the genome sequence of an organism is available. Revised Statement of Database Requirements A database is required that will hold information about sequence entries from public databases of interest along with the type of features in that sequence. The users should be able to search for a database accession number, organism name, formal or common , and/or strain , or public database name and retrieve a listing of all relevant sequences as well as whether the full genome sequence of the organism in question is publicly available . They should also be able to retrieve any sequences containing particular feature types. For each public database sequence listed, the users would like to see the URL for that database and the type of data it stores . The sequence data itself does not need to be stored in the database.

  19. Entity Relationship Modelling A way of formally representing your database requirements. Your model will include a list of entity types, constraints, and assumptions and an E-R diagram. Entity: a representation of a thing that there is a need to record data about Relationship: an association between entities that needs to be recorded E-R diagram: a pictorial representation of your data requirements

  20. Entity Relationship Modelling What are the “things” of interest in this description? Statement of Database Requirements A database is required that will hold information about sequences from public databases of interest along with the type of features in that sequence. The users should be able to search for a database accession number, organism name, formal or common, and/or strain, or public database name and retrieve a listing of all relevant sequences as well as whether the full genome sequence of the organism in question is publicly available. They should also be able to retrieve any sequences containing particular feature types. For each public database sequences listed, the users would like to see the URL for that database and the type of data it stores. The sequence data itself does not need to be stored in the database.

Recommend


More recommend