nebc database course 2008 implementing a relational
play

NEBC Database Course 2008 Implementing a Relational Database Tim - PowerPoint PPT Presentation

NEBC Database Course 2008 Implementing a Relational Database Tim Booth : tbooth@ceh.ac.uk Implementing a relational database establish requirements Data Requirements data analysis Conceptual Data Model database design Logical Schema


  1. NEBC Database Course 2008 Implementing a Relational Database Tim Booth : tbooth@ceh.ac.uk

  2. Implementing a relational database establish requirements Data Requirements data analysis Conceptual Data Model database design Logical Schema Specification implement schema and database

  3. Our design so far: relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism relation Organism OrganismNo: Integer Species: String Strain: String GenomeSeq: Boolean CommonName: String primary key: OrganismNo relation Feature FeatID: String Name: String primary key: FeatID foreign key: SourceSequence references Sequence

  4. Implementing our database • Most of hard work is already done • Create relations using SQL • Define the data types for our columns • Define primary and foreign keys • Add constraints • Add any appropriate default values

  5. Our design so far: relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName

  6. Create Table relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename , datatype , url );

  7. Add data types Reminder: • Numerical • integer,float,numeric • String/Text • varchar,text • Date/Time • timestamp,date • Boolean

  8. Add data types relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200) );

  9. Primary Keys relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200), primary key (databasename) );

  10. Foreign Keys relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism CREATE TABLE sequence ( accessionnumber varchar(50), id varchar(50), sourcedatabase varchar(50), sourceorganism integer, primary key (accessionnumber), foreign key (sourcedatabase) references publicdatabase, foreign key (sourceorganism) references organism );

  11. Foreign Keys relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism CREATE TABLE sequence ( accessionnumber varchar(50), id varchar(50), sourcedatabase varchar(30), sourceorganism integer, primary key (accessionnumber), foreign key (sourcedatabase) references publicdatabase(databasename), foreign key (sourceorganism) references organism(organismnumber) );

  12. feature table CREATE TABLE feature ( featid varchar(50), name varchar(100), sourcesequence varchar(50), primary key (featid), foreign key (sourcesequence) references sequence(accessionnumber) );

  13. organism table CREATE TABLE organism ( organismnumber integer species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) );

  14. Constraints • Constraints restrict the values that can be inserted or updated in columns • Types of constraints • NOT NULL • UNIQUE • Simply add to column definition url varchar(100) NOT NULL or url varchar(100) UNIQUE • NOT NULL and UNIQUE implicit on primary key

  15. Constraints • CHECK constraint numberoflegs integer CHECK (numberoflegs>2)

  16. publicdatabase table CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200) UNIQUE, primary key (databasename) );

  17. Constraints • To keep links between tables working you need to preserve the matching values – referential integrity • automatically set up when you declare the primary and foreign keys • This will prevent you from deleting a record with a primary key before you have deleted all the child foreign key records

  18. Constraints • Example from 'BigHit' database

  19. Constraints

  20. organism table CREATE TABLE organism ( organismnumber integer, species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) ); How shall we create the unique primary key values for organismnumber?

  21. Sequences • Sequence is a database object in PostgreSQL which gives you an automatically incrementing numeric value (equivalent to 'autonumber' in Access) CREATE SEQUENCE my_seq (can specify increment,min and max) SELECT NEXTVAL('my_seq')

  22. Default values • Still don't want to have to select the value each time • Can set a default value for column which is automatically filled in every time a record is inserted CREATE TABLE organism ( organismnumber integer DEFAULT NEXTVAL('my_seq'), species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) );

  23. Create your database • To create your database run your SQL table and other object creation statements in a single script • Example - demodatabase.sql • Be sure create tables in the right order • Can't create table that refers to a primary key in a table that doesn't exist yet • You also need data...

  24. Populate your database • Insert data using INSERT sql statements INSERT INTO organism (species,strain,genomeseq,commonname) VALUES ('Oryctolagus cuniculus',NULL,'false','rabbit'); • Default values will inserted automatically CREATE TABLE organism ( organismnumber integer DEFAULT NEXTVAL('my_seq'), ...

  25. Populate your database • Be sure to insert data in correct order • Don't try and insert a foreign key value when the primary key value hasn't been inserted yet • Run the demodatabase.sql script

  26. Querying your database • Now that your database is set up and data has been inserted we can query it

  27. Database Sequence Type ID Accession Organism Name Number Swissprot Protein Phosphorylase B KPB1_Rabit Rabbit kinase alpha regulatory chain UniProt Protein Troponin T Trt3_rabit Swissprot Proten Glycogen PHS2_RABIT rabbit phosphorylase Swissprot protein TRIC_RABIT Troponin I EMBL Nucleotide rabbit muscle OCPHOS2 rabbit phosphorylase mrna dbEST Nucleotide CK829726 Rabbit TrEMBL protein Q8MJF7 Rabbit pol protein EMBL Nucleotide OXPKA Rabbit Rabbit phosphorylase PDB protein_structure Glycogen 1ABB rabit Phosphorylase

  28. Querying your database

  29. Querying your database

  30. Querying your database

  31. Querying your database • What have we gained? • No data redundancy • Data is consistent • Enforced quality control – no missing data • Only have to change data once • Flexibility to run a variety of queries

  32. Views • Views are queries that are saved in the database as objects • Appear much like a table which can be queried in the same way • Good if underlying query is very complex CREATE VIEW viewname AS query

  33. Views

  34. Views

  35. Indexes • Searching data by scanning is slow • Indexes make this searching faster • Implicit indexes are set up for primary keys as these are used a lot for searching data

  36. Indexes • An index can be created on any column CREATE INDEX orgname_idx on organism (commonname) • An index is helpful on a column that is regularly searched on (i.e. Used in the WHERE clause)

  37. Index worked example • There are more movies in the file demodata/moremovies.csv • These are already loaded into the database table 'demodata.moremovies' • in PGAdmin3: • INSERT INTO bighit.movie (SELECT * FROM demodata.moremovies); • SELECT * FROM movie where rating = 'U';

  38. Index worked example • Explain the query • Now make an index: • CREATE INDEX myindex ON movie (rating); • Now explain the original query again • This works for very complex queries!

  39. Complex Query Analysis

  40. The “MART” Strategy • Normalisation is the process of removing data redundancy from your database design • But it adds complexity • Views can make querying simpler • Indexes can make querying faster • But... Sometimes this is not enough. Maintaining a summary table for quick querying is known as 'denormalisation' • Many large databases (eg. EnsEMBL) resort to this

  41. More features... • stored procedures • triggers • cascading updates • custom types • custom functions • extension modules • load balancing • replication • ...

Recommend


More recommend