EGTDC Database Course 2004 Introduction to Databases Tim Booth : tbooth@ceh.ac.uk Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Course Overview What you will take home: What is a relational database? ● Why would you use one? ● How do you interact with one? ● How do you design and build one? ● What is the theory behind good relational design? ● What software is available? ● How can I build user interfaces, or display my data on the web? ● Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
What is a database? Most general definition: ● “A database is an integrated collection of data organised to meet the needs of one or more users” Databases stored on computer may be of various types: ● Flat file database such as a comma-separated ● file Indexed file database such as BLAST dbs ● 'Flat' relational database such as Access .mdb ● Database managed by a dedicated application ● (RDBMS) such as PostgreSQL or Oracle. Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Benefits of a Relational Approach Database structure reflects the structure of the data ● Eliminate redundant data ● Data validation ● Powerful querying via SQL ● Security, data integrity and multi-user access ● Compatibility with a huge range of existing software and development ● tools Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
What is a Relational Database? A relational database is a collection of tables. Each table has rows and ● columns, and looks rather like a spreadsheet. The movie table in the example database: ● Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
The Example Database The 'bighit' database represents a video store. ● Each person has access to an individual copy of this database, and we ● will explore it as the course progresses. The next slide shows a selection of tables in the database, and links ● between them. Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
The Example Database Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
More about database tables Tables are properly called 'relations'. ● A table holds a collection of records or 'tuples'. These are shown as rows ● when you view the data. Each record has a fixed set of fields or 'attributes', each of which contains ● data of a specified type. These appear as columns in the data view. Inside the database, the order of the rows and columns is not significant. ● The database sorts them when you ask to display the data. A database can have many different tables, corresponding to things we ● want to store data about. Other entities can also live in your database – we shall meet them later! ● Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Access the sample database Each user has a private copy of the sample database. We will log in now. ● Access the command line by right-clicking the desktop and selecting 'New ● Terminal'. To start the PGAdminIII tool type: ● pgadmin & Select the server in the tree view, login with user/pass = ivgws0?/sushi ● In the tree browser, open 'databases' then 'ivgws0?'. ● Open the 'bighit' schema, then the 'tables' link. ● Select the 'movie' table – note the definition which appears in the right ● pane. Right click and select 'view data'. ● Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Introducing SQL SQL is a language we will use to get at the data in the database ● Generally human readable ● Retrieve information with SELECT statements ● eg. SELECT * FROM MOVIE A result set comes back. ● Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Bring up an SQL Editor Window Click the pencil icon to get a new SQL editor window. ● In the new window, use the eraser icon to clear any text already in the ● input box. Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Talking SQL Type the query shown (1) and hit the green arrow (2) to run it. ● 2 2 1 1 Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
More about SQL SQL is the Structured Query Language ● The language and structure of databases are intimately linked. ● Sometimes pronounced 'sequel' – especially by Microsoft devotees. ● Invented by Dr. Edgar Frank 'Ted' Codd (1924-2003), who was from ● Oxford, and set down the foundations of the relational database concept while working in the IBM San Jose Research Laboratory. The classic paper 'A relational model of data for large shared data banks' ● appeared in the ACM (Association for Computing Machinery) magazine, June 1970 Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Being More Specific The general form of basic SELECT statements: ● SELECT title, year SELECT WHAT FROM movie FROM WHICH TABLE WHERE year > 1990 WHERE SOME TEST ORDER BY year ORDER BY WHICH COLUMN The last two parts are both optional. ● SELECT * is a special case to get all columns. ● Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Modifying Data INSERT, DELETE and UPDATE are the commands to modify data. ● INSERT INSERT INTO reservation INTO TABLE (accountid, movieid, datereserved) (COLUMNS) VALUES (23, 15, '12 June') VALUES (VALUES) DELETE FROM DELETE FROM reservation TABLE where accountid = 23 WHERE CONDITION Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Modifying Data 2 INSERT, DELETE and UPDATE are the commands to modify data. ● UPDATE movie UPDATE TABLE SET SET genre = 'fantasy', COLUMN1 = VALUE1, rating = 'PG' COLUMN2 = VALUE2 WHERE id = 15 WHERE CONDITION Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Getting Exactly What You Want In a RDB you try not to store redundant or derived data. ● For example, we store the year of release of a movie, but not the age. ● We can specify a calculation within the SELECT statement to calculate ● the age of a movie on-the-fly. A) select title, year from movie B) select title, extract(YEAR FROM now()) - year from movie 'extract' and 'now' are functions. ● Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Name That Column Column aliases allow you to specify the name of a column retrieved with ● SELECT. Running the previous example, we see that the second column gets ● called '?column?', because the database has no idea what to call the result of the formula. We can simply say: ● select title, extract(YEAR FROM now()) - year as age from movie Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Learning by Example SQL allows you to derive data by supplying a variety of functions and ● constructs. Rather than just talking through them, we will introduce some by ● example in the exercises. Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Relational Databases in Biology Some databases, such as the NCBI Taxonomy, can be downloaded and ● set up on a local machine. ftp://ftp.ncbi.nih.gov/pub/taxonomy/ Ensembl permits direct access to the core database, which we will look at ● later. The document describing how to get access is here: http://cvsweb.sanger.ac.uk/cgi−bin/cvsweb.cgi/ ~checkout~/ensembl/docs/tutorial/ensembl_tutorial.pdf For large datasets, deploying an RDBMS can provide fast, flexible access ● to the data. Most data-portal websites have such a database at their heart. Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Credits The Bighit database is adapted from the website of Greg Riccardi - ● http://riccardi.pearsoncmg.com/dbmgmt/. Examples and quotes from Database Design and Management, by Rob ● Eadie (Continuum, 1998). Programming the Perl DBI, Alligator Descartes & Tim Bunce, (O'Reilly ● 2000) Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Recommend
More recommend