Implementing a Relational Database Joe Wood anjw@ceh.ac.uk Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Implementing a relational database establish requirements Data Requirements data analysis Conceptual Data Model database design Logical Schema Specification implement schema and database Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Our design so far: relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism relation Organism OrganismNo: Integer Species: String Strain: String GenomeSeq: Boolean CommonName: String primary key: OrganismNo relation Feature FeatID: String Name: String primary key: FeatID foreign key: SourceSequence references Sequence Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Implementing our database • Most of hard work is already done • Create relations using SQL • Define the data types for our columns • Define primary and foreign keys • Add constraints • Add any appropriate default values Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Our design so far: relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Create Table relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename datatype url ); Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Add data types Reminder: • Numerical • integer,float,numerical • String/Text • varchar,text • Date/Time • timestamp,date • Boolean Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Add data types relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200) ); Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Primary Keys relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200), primary key (databasename) ); Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Foreign Keys relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism CREATE TABLE sequence ( accessionnumber varchar(50), id varchar(50), sourcedatabase varchar(30), sourceorganism integer, primary key (accessionnumber), foreign key (sourcedatabase) references publicdatabase(databasename), foreign key (sourceorganism) references organism(organismnumber) ); Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Foreign Keys relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism CREATE TABLE sequence ( accessionnumber varchar(50), id varchar(50), sourcedatabase varchar(30), sourceorganism integer, primary key (accessionnumber), foreign key (sourcedatabase) references publicdatabase, foreign key (sourceorganism) references organism ); Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
feature table CREATE TABLE feature ( featid varchar(50), name varchar(100), sourcesequence varchar(50), primary key (featid), foreign key (sourcesequence) references sequence(accessionnumber) ); Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
organism table CREATE TABLE organism ( organismnumber integer species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) ); Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Constraints • Constraints restrict the values that can be inserted or updated in columns • Types of constraints • NOT NULL • UNIQUE • Simply add to column definition url varchar(100) NOT NULL or url varchar(100) UNIQUE Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Constraints • NOT NULL and UNIQUE implicit when primary key • CHECK constraint numberoflegs integer check (numberoflegs>2) Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
publicdatabase table CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200) UNIQUE, primary key (databasename) ); Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Constraints • Foreign key constraints • To keep links between tables working you need to preserve the matching values – referential integrity • These constraints automatically set up for you by RDBMS when you declare the primary and foreign keys • This will prevent you from deleting a record with a primary key before you have deleted all the child foreign key records Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Constraints • Example from 'BigHit' database Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Constraints Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
organism table CREATE TABLE organism ( organismnumber integer, species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) ); How shall we create the unique primary key values for organismnumber? Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Sequences • Could do complicated checks to work out the last number entered, add 1 and insert for each new record • Sequence is a database object in PostgreSQL which essentially an automatically incrementing numeric value (equivalent to 'autonumber' in Access) CREATE SEQUENCE 'my_seq' (can specify increment,min and max) SELECT NEXTVAL('my_seq') Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Default values • Still don't want to have to select the value each time • Can set a default value for column which is automatically filled in every time a record is inserted CREATE TABLE organism ( organismnumber integer DEFAULT NEXTVAL('my_seq'), species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) ); Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Create your database • To create your database run your SQL table and other object creation statements in a single script • Example - demodatabase.sql • Be sure create tables in the right order • Can't create table that refers to a primary key in a table that doesn't exist yet • You also need data... Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Populate your database • Insert data using INSERT sql statements INSERT INTO organism (species,strain,genomeseq,commonname) VALUES ('Oryctolagus cuniculus',NULL,'false','rabbit'); • Default values will inserted automatically CREATE TABLE organism ( organismnumber integer DEFAULT NEXTVAL('my_seq'), ... Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Populate your database • Be sure to insert data in correct order • Don't try and insert a foreign key value when the primary key value hasn't been inserted yet • Run the demodatabase.sql script Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Querying your database • Now that your database is set up and data has been inserted we can query it Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Database Sequence Type ID Accession Organism Name Number Swissprot Protein Phosphorylase B KPB1_Rabit Rabbit kinase alpha regulatory chain UniProt Protein Troponin T Trt3_rabit Swissprot Proten Glycogen PHS2_RABIT rabbit phosphorylase Swissprot protein TRIC_RABIT Troponin I OCPHOS2 rabbit EMBL Nucleotide rabbit muscle phosphorylase mrna dbEST Nucleotide CK829726 Rabbit TrEMBL protein Q8MJF7 Rabbit pol protein EMBL Nucleotide OXPKA Rabbit Rabbit phosphorylase PDB protein_structure Glycogen 1ABB rabit Phosphorylase Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Querying your database Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Querying your database Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Querying your database Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Recommend
More recommend