NEBC Database Course 2008 Implementing a Relational Database Tim Booth : tbooth@ceh.ac.uk
Implementing a relational database establish requirements Data Requirements data analysis Conceptual Data Model database design Logical Schema Specification implement schema and database
Our design so far: relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism relation Organism OrganismNo: Integer Species: String Strain: String GenomeSeq: Boolean CommonName: String primary key: OrganismNo relation Feature FeatID: String Name: String primary key: FeatID foreign key: SourceSequence references Sequence
Implementing our database • Most of hard work is already done • Create relations using SQL • Define the data types for our columns • Define primary and foreign keys • Add constraints • Add any appropriate default values
Our design so far: relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName
Create Table relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename , datatype , url );
Add data types Reminder: • Numerical • integer,float,numeric • String/Text • varchar,text • Date/Time • timestamp,date • Boolean
Add data types relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200) );
Primary Keys relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200), primary key (databasename) );
Foreign Keys relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism CREATE TABLE sequence ( accessionnumber varchar(50), id varchar(50), sourcedatabase varchar(50), sourceorganism integer, primary key (accessionnumber), foreign key (sourcedatabase) references publicdatabase, foreign key (sourceorganism) references organism );
Foreign Keys relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism CREATE TABLE sequence ( accessionnumber varchar(50), id varchar(50), sourcedatabase varchar(30), sourceorganism integer, primary key (accessionnumber), foreign key (sourcedatabase) references publicdatabase(databasename), foreign key (sourceorganism) references organism(organismnumber) );
feature table CREATE TABLE feature ( featid varchar(50), name varchar(100), sourcesequence varchar(50), primary key (featid), foreign key (sourcesequence) references sequence(accessionnumber) );
organism table CREATE TABLE organism ( organismnumber integer species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) );
Constraints • Constraints restrict the values that can be inserted or updated in columns • Types of constraints • NOT NULL • UNIQUE • Simply add to column definition url varchar(100) NOT NULL or url varchar(100) UNIQUE • NOT NULL and UNIQUE implicit on primary key
Constraints • CHECK constraint numberoflegs integer CHECK (numberoflegs>2)
publicdatabase table CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200) UNIQUE, primary key (databasename) );
Constraints • To keep links between tables working you need to preserve the matching values – referential integrity • automatically set up when you declare the primary and foreign keys • This will prevent you from deleting a record with a primary key before you have deleted all the child foreign key records
Constraints • Example from 'BigHit' database
Constraints
organism table CREATE TABLE organism ( organismnumber integer, species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) ); How shall we create the unique primary key values for organismnumber?
Sequences • Sequence is a database object in PostgreSQL which gives you an automatically incrementing numeric value (equivalent to 'autonumber' in Access) CREATE SEQUENCE my_seq (can specify increment,min and max) SELECT NEXTVAL('my_seq')
Default values • Still don't want to have to select the value each time • Can set a default value for column which is automatically filled in every time a record is inserted CREATE TABLE organism ( organismnumber integer DEFAULT NEXTVAL('my_seq'), species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) );
Create your database • To create your database run your SQL table and other object creation statements in a single script • Example - demodatabase.sql • Be sure create tables in the right order • Can't create table that refers to a primary key in a table that doesn't exist yet • You also need data...
Populate your database • Insert data using INSERT sql statements INSERT INTO organism (species,strain,genomeseq,commonname) VALUES ('Oryctolagus cuniculus',NULL,'false','rabbit'); • Default values will inserted automatically CREATE TABLE organism ( organismnumber integer DEFAULT NEXTVAL('my_seq'), ...
Populate your database • Be sure to insert data in correct order • Don't try and insert a foreign key value when the primary key value hasn't been inserted yet • Run the demodatabase.sql script
Querying your database • Now that your database is set up and data has been inserted we can query it
Database Sequence Type ID Accession Organism Name Number Swissprot Protein Phosphorylase B KPB1_Rabit Rabbit kinase alpha regulatory chain UniProt Protein Troponin T Trt3_rabit Swissprot Proten Glycogen PHS2_RABIT rabbit phosphorylase Swissprot protein TRIC_RABIT Troponin I EMBL Nucleotide rabbit muscle OCPHOS2 rabbit phosphorylase mrna dbEST Nucleotide CK829726 Rabbit TrEMBL protein Q8MJF7 Rabbit pol protein EMBL Nucleotide OXPKA Rabbit Rabbit phosphorylase PDB protein_structure Glycogen 1ABB rabit Phosphorylase
Querying your database
Querying your database
Querying your database
Querying your database • What have we gained? • No data redundancy • Data is consistent • Enforced quality control – no missing data • Only have to change data once • Flexibility to run a variety of queries
Views • Views are queries that are saved in the database as objects • Appear much like a table which can be queried in the same way • Good if underlying query is very complex CREATE VIEW viewname AS query
Views
Views
Indexes • Searching data by scanning is slow • Indexes make this searching faster • Implicit indexes are set up for primary keys as these are used a lot for searching data
Indexes • An index can be created on any column CREATE INDEX orgname_idx on organism (commonname) • An index is helpful on a column that is regularly searched on (i.e. Used in the WHERE clause)
Index worked example • There are more movies in the file demodata/moremovies.csv • These are already loaded into the database table 'demodata.moremovies' • in PGAdmin3: • INSERT INTO bighit.movie (SELECT * FROM demodata.moremovies); • SELECT * FROM movie where rating = 'U';
Index worked example • Explain the query • Now make an index: • CREATE INDEX myindex ON movie (rating); • Now explain the original query again • This works for very complex queries!
Complex Query Analysis
The “MART” Strategy • Normalisation is the process of removing data redundancy from your database design • But it adds complexity • Views can make querying simpler • Indexes can make querying faster • But... Sometimes this is not enough. Maintaining a summary table for quick querying is known as 'denormalisation' • Many large databases (eg. EnsEMBL) resort to this
More features... • stored procedures • triggers • cascading updates • custom types • custom functions • extension modules • load balancing • replication • ...
Recommend
More recommend