NEBC Database Course 2008 Welcome Introduction to Databases Tim Booth : tbooth@ceh.ac.uk
Course Overview What you will take home: What is a relational database? ● Why would you use one? ● How do you interact with one? ● How do you design and build one? ● What is the theory behind good relational design? ● What software is available? ● How can I build user interfaces, or display my data on the web? ●
What is a database? Most general definition: ● “A database is an integrated collection of data organised to meet the needs of one or more users” Databases stored on computer may be of various types: ● Flat file database such as a comma- ● separated file Indexed file database such as BLAST dbs ● 'Flat' relational database such as ● Access .mdb Database managed by a dedicated ● application (RDBMS) such as PostgreSQL or Oracle.
Benefits of a Relational Approach Database structure reflects the structure of the data ● Eliminate redundant data ● Data validation ● Powerful querying via SQL ● Security, data integrity and multi-user access ● Compatibility with a huge range of existing software and ● development tools
What is a Relational Database? A relational database is a collection of tables. Each table has ● rows and columns, and looks rather like a spreadsheet. The movie table in the example database: ●
The Example Database The 'bighit' database represents a video store. ● Each person has access to an individual copy of this database, ● and we will explore it as the course progresses. The next slide shows a selection of tables in the database, and ● links between them.
The Example Database
More about database tables Tables are properly called 'relations'. ● A table holds a collection of records or 'tuples'. These are ● shown as rows when you view the data. Each record has a fixed set of fields or 'attributes', each of ● which contains data of a specified type. These appear as columns in the data view. Inside the database, the order of the rows and columns is not ● significant. The database sorts them when you ask to display the data. A database can have many different tables, corresponding to ● things we want to store data about. Other entities can also live in your database – we shall meet ● them later!
Access the sample database Each user has a private copy of the sample database. We will ● log in now. Start the pgAdmin III application from under System Tools in ● the main menu. File → Add Server ● Select the server in the tree view, login with user/pass ● In the tree browser, open 'databases' then 'student0?'. ● Open the 'bighit' schema, then the 'tables' link. ● Select the 'movie' table – note the definition which appears in ● the right pane. Right click and select View data → View all rows. ●
Introducing SQL SQL is a language we will use to get at the data in the database ● Generally human readable ● Retrieve information with SELECT statements ● eg. SELECT * FROM MOVIE A result set comes back. ●
Bring up an SQL Editor Window Click the pencil icon to get a new SQL editor window. ● In the new window, use the eraser icon to clear any text ● already in the input box.
Talking SQL Type the query shown (1) and hit the green arrow (2) to run it. ● 2 2 1 1
More about SQL SQL is the Structured Query Language ● The language and structure of databases are intimately linked. ● Sometimes pronounced 'sequel' – especially by Microsoft ● devotees. Invented by Dr. Edgar Frank 'Ted' Codd (1924-2003), who was ● from Oxford, and set down the foundations of the relational database concept while working in the IBM San Jose Research Laboratory. The classic paper 'A relational model of data for large shared ● data banks' appeared in the ACM (Association for Computing Machinery) magazine, June 1970
Being More Specific The general form of basic SELECT statements: ● SELECT title, year SELECT WHAT FROM movie FROM WHICH TABLE WHERE year > 1990 WHERE SOME TEST ORDER BY year ORDER BY WHICH COLUMN The last two parts are both optional. ● SELECT * is a special case to get all columns. ●
Modifying Data INSERT, DELETE and UPDATE are the commands to modify ● data. INSERT INSERT INTO reservation INTO TABLE (accountid, movieid, (COLUMNS) datereserved) VALUES (VALUES) VALUES (23, 15, '12 June') DELETE FROM DELETE FROM reservation TABLE where accountid = 23 WHERE CONDITION
Modifying Data 2 INSERT, DELETE and UPDATE are the commands to modify ● data. UPDATE movie UPDATE TABLE SET SET genre = 'fantasy', COLUMN1 = VALUE1, rating = 'PG' COLUMN2 = VALUE2 WHERE id = 15 WHERE CONDITION
Getting Exactly What You Want In a RDB you try not to store redundant or derived data. ● For example, we store the year of release of a movie, but not ● the age. We can specify a calculation within the SELECT statement to ● calculate the age of a movie on-the-fly. A) SELECT title, year from movie B) SELECT title, extract(YEAR FROM now()) - year FROM movie 'extract' and 'now' are functions. ●
Name That Column Column aliases allow you to specify the name of a column ● retrieved with SELECT. Running the previous example, we see that the second column ● gets called '?column?', because the database has no idea what to call the result of the formula. We can simply say: ● SELECT title, extract(YEAR FROM now()) - year as age FROM movie
Learning by Example SQL allows you to derive data by supplying a variety of ● functions and constructs. Rather than just talking through them, we will introduce some ● by example in the exercises.
Relational Databases in Biology Some databases, such as the NCBI Taxonomy, can be ● downloaded and set up on a local machine. ftp://ftp.ncbi.nih.gov/pub/taxonomy/ Ensembl permits direct access to the core database, which we ● will look at later. The document describing how to get access is here: http://cvsweb.sanger.ac.uk/cgi−bin/cvsweb.cgi/ ~checkout~/ensembl/docs/tutorial/ensembl_tutorial.pdf For large datasets, deploying an RDBMS can provide fast, ● flexible access to the data. Most data-portal websites have such a database at their heart.
Credits The Bighit database is adapted from the website of Greg ● Riccardi - http://riccardi.pearsoncmg.com/dbmgmt/. Examples and quotes from Database Design and Management, ● by Rob Eadie (Continuum, 1998). Programming the Perl DBI, Alligator Descartes & Tim Bunce, ● (O'Reilly 2000) Course originally delivered in 2004; developed by Tim Booth, ● Joe Wood, Bela Tiwari and Dan Swan at the EGTDC.
Recommend
More recommend