Relational Algebra Spring 2012 Instructor: Hassan Khosravi
Querying relational databases Lecture given by Dr. Widom on querying Relational Models 2.2
2.1 An Overview of Data Models 2.1.1 What is a Data Model? 2.1.2 Important Data Models 2.1.3 The Relational Model in Brief 2.1.4 The Semi-structured Model in Brief 2.1.5 Other Data Models 2.1.6 Comparison of Modeling Approaches 2.3
2.1.1 What is a Data Model? Data model is a notion for describing data or information. Real World Math Model: Structure of the data (tuples) 1. Operations on the data – queries to retrieve and modify information 2. Constraints on the data – year has to be integer, name is string . 3. Important data models The relational Model The semi-structured data model XML 2.4
Relational Model in Brief Title Year Length genre Gone with the wind 1939 231 Drama Star Wars 1977 124 SciFi Wayne’s world 1992 95 comedy Relational model is based on tables Operations: query, modify Constraints: year is Integer between 1930-2012 The structure may appear to resemble an array of structs in C where the column headers are the field names and each row represent the values of one struct in the array. Distinction in scales of relations Not normally implemented as main-memory structure Take into consideration to access relations on hard drive 2.5
Semi structure data resembles trees or graphs rather than tables or arrays. The Semi-structured Model in Brief Operations usually involve following in <Movies> the tree. <Movie title=“Gone with the Wind”> Find the movies with the comedy genre. <Year>1939</Year> <Length>231</Length> <Genre>drama</Genre> Constraints often involve data types of </Movie> values associated with a tag. <Movie title=“Star Wars”> Values associated with the length tag are integers <title= Wars > <Year>1977</Year> <Length>124</Length> <Genre>sciFi</Length> </Movie> <Movie title=“Wayne’s World”> <Year>1992</Year> <Length>95</Length> <Genre>comedy</Genre> </Movie> </Movies> 2.6
Comparison of Modeling Approaches Semi-structured models have more flexibility than relations. However, the relational model is still preferred in DBMS’s. 1. Efficiency of access to data and efficiency of modifications to that data are more important than flexibility 2. ease of use is more important than flexibility. SQL enables the programmer to express their wishes at very high level. The strongly limited set of operations can be optimized to run very fast 2.7
Basics of the Relational model Title Year Length genre Gone with the wind 1939 231 Drama Star Wars 1977 124 SciFi Wayne’s world 1992 95 comedy Attributes: columns of a relation are named attributes. Schema: the name of the relation and the set of attributes Movies(title, year, length, genre) Tuples: The rows of a relation, other than the header Domains: the value for each attribute must be atomic (can not be structure). Each attribute has a domain of values. 2.8
Equivalent Representations of a Relation Relations are sets of tuples not lists of tuples. The order of tuples does not matter. Attributes could be reordered too. Title Year Length genre Gone with the wind 1939 231 Drama Star Wars 1977 124 SciFi Wayne’s world 1992 95 comedy Year Genre Title length 1977 SciFi Star Wars 124 Comedy Wayne’s World 1992 95 1939 Drama Gone With the Wind 231 How many different ways can we present the given relation? 2.9
Relation Instances and Keys A set of tuples for a given relation is called an instance of that relation. It is expected for the instance of the relation to change over time. New movies are added to the table It is less common for the schema of a relation to change. It is hard to add a new value for all the current tuples if a new attribute is added to the schema. Keyes of relations Key constraints: A set of attributes form a key if we do not allow two tuples in a relation instance to have the same value. We indicate the attributes that form a key by underlining them Movies(title, year, length, genre) Key most be true for all possible instances of a relation not a specific instance. Genre is not a key What if our data does not have a key? Generate artificial ID. Student Number 2.10
Database Schema about Movies MovieExec ( Movies( name: string, title: string; address : string Year : integer, cert# : integer Length : integer, netWorth : integer Genre : string, ) studioName : string, producerC# : integer ) Studio ( name: string, Moviestar ( address : string name : string, pressC# : integer address : string, ) gender : char, birthdate : date ) StarsIn ( MovieTitle: string, Movieyear : integer Starname : string ) 2.11
Defining a Relation Schema in SQL 2.3.1 Relations in SQL 2.3.2 Data Types 2.3.3 Simple Table Declarations 2.3.4 Modifying Relation Schemas 2.3.5 Default Values 2.3.6 Declaring Keys 2.3.7 Exercises for Section 2.3 2.12
2.3.1 Relations in SQL SQL also pronounced (sequel) is the principal language used to describe and manipulate relational database SQL makes a distinction between three kinds of relations Stored relations (tables): this relations are tables that exist in the database we can query and modify Views: are relations defined by a computation. They are not stored but constructed. We just query them (chapter 8) Temporary tables: are constructed by SQL language processor during optimization. These are not stored nor seen by the user 2.13
Data Types Char(n): a fixed-length string of up to n characters. Char(5) of foo is stored “ foo ” Varchar(n): a variable-length string of up to n characters Varchar(5) of foo is stored “ foo ” Bit(n), Varbit(n) fixed and variable string of upto n bits. Boolean: True False and although it would surprise George Boole Unknown Int or Integer: typical integer values Float or real: typical real values Decimal(6,2) could be 0123.45 Date and time: essentially char strings with constraints. 2.14
2.3.3 Simple Table Declarations CREATE TABLE Movie ( Movies( title: string; title VARCHAR(255), Year : integer, year INTEGER, Length : integer, Genre : string, length INTEGER, studioName : string, producerC# : integer inColor CHAR(1), ) studioName CHAR(50), producerC# INTEGER, ); CREATE TABLE MOVIESTAR ( Moviestar ( NAME CHAR(30), name : string, address : string, ADDRESS VARCHAR2(50), gender : char, GENDER CHAR(6) , birthdate : date ) BIRTHDATE DATE ); 2.15
Modifying Relation Schemas We can delete a table R by the following SQL command Drop table R; We can modify a table by the command Alter Table MovieStar ADD phone CHAR(16); Alter Table MovieStar Drop birthdate; Defaults values To use the default character ? As the default for an unknown gender. Earliest possible date for Unknown Birthdate . DATE ‘0000 -00- 00’ Gender CHAR(1) DEFAULT ‘?’, Birthdate DATE DEFAULT DATE ‘0000 -00- 00’, ALTER TABLE MovieStar ADD phone CHAR (16) DEFAULT ‘ unlisted’; 2.16
2.3.6 Declaring Keys Two ways to declare keys in CRATE table statement Primary key can not be null Unique can be null Replace primary with unique in examples to get the example with unique CREATE TABLE MovieStar ( name CHAR (30) Primary Key, address VARCHAR (255), gender CHAR(1), birthdate DATE ); CREATE TABLE MovieStar ( name CHAR (30), address VARCHAR (255), gender CHAR(1), birthdate DATE PRIMARY KEY (name) ); 2.17
Example 2.7 The Relation Movie, whose key is the pair of attributes ‘title and year’ must be declared like this CREATE TABLE Movies( title CHAR(100), year INTEGER, length INTEGER, genre CHAR(10), studiName CHAR(30), producerC# INTEGER, PRIMARY KEY (title,year) ); 2.18
Quick summary Lecture given by Dr. Widom on Relational Model definition 2.19
2.4 An Algebraic Query Language 2.4.1 Why Do We Need a Special Query Language? 2.4.2 What is an Algebra? 2.4.3 Overview of Relational Algebra 2.4.4 Set Operations on Relations 2.4.5 Projection 2.4.6 Selection 2.4.7 Cartesian Product 2.4.8 Natural Joins 2.4.9 Theta-Joins 2.4.10 Combining Operations to Form Queries 2.4.11 Naming and Renaming 2.4.12 Relationships Among Operations 2.4.13 A Linear Notation for Algebraic Expressions 2.4.14 Exercises for Section 2.4 2.20
Why Do We Need a Special Query Language? Why not just use C or java instead of introducing relational algebra ? Relational algebra is useful because it is less powerful than C and Java. One of the only areas where non-Turing-complete languages make sense. Relational algebra CANNOT determine whether the number of tuples are odd or even Being less powerful is helpful because Ease of programming Ease of compilation – Ease of optimization 2.21
Recommend
More recommend