storing data review
play

Storing Data Review Data collection is an important issue Dirty - PDF document

Storing Data Review Data collection is an important issue Dirty data Multiple sources Data collection Data consolidation Data cleaning Data integration CMPT 354: Database I -- Storing Data 2 What Is the Next


  1. Storing Data

  2. Review • Data collection is an important issue – Dirty data – Multiple sources • Data collection • Data consolidation – Data cleaning – Data integration CMPT 354: Database I -- Storing Data 2

  3. What Is the Next After We Get Data? • Store data – Holders: a paper notebook, a spreadsheet, a relational database, … – We focus on relational tables in this course • Does a database need some structure? If so, how to construct the structure? • How can a user store data into a database? • Can we prevent two students sharing a same student-id from happening? CMPT 354: Database I -- Storing Data 3

  4. Outline • How to create a relational table? • How to insert data into a table? • Integration constraints: making the data correct • Data first or schema first? CMPT 354: Database I -- Storing Data 4

  5. Storing Data in Tables • For each web page, store the information about – URL, language, last update time, number of outgoing hyperlinks • A spreadsheet works! URL Language Last update # of out links www.sfu.ca HTML Sept. 5 45 CMPT 354: Database I -- Storing Data 5

  6. Tables • Relational databases store data in tables – A database can have many tables • Schema: WebPage(URL, language, update_date, n_out_links) CMPT 354: Database I -- Storing Data 6

  7. Create a Table • CREATE statement CREATE TABLE WebPage ( URL CHAR(256), language CHAR(40), update_date DATETIME, n_out_links INT ) DATETIME is the DATE type in SQL Server CMPT 354: Database I -- Storing Data 7

  8. Data Types (1) • Character strings – Fixed length: CHAR(n) – Varying length, up to n characters: VARCHAR(n) • Bit strings – Fixed length: BIT(n) – Varying length, up to n bits: BIT VARYING(n) • Boolean values BOOLEAN – TRUE, FALSE, or UNKNOWN CMPT 354: Database I -- Storing Data 8

  9. Data Types (2) • Integers INT or INTEGER – SHORTINT is also supported in some systems • Floating-point numbers – FLOAT or REAL – DOUBLE PRECESION – Real numbers with a fixed decimal point: DECIMAL(n, d) • 123.45 is a possible value of type DECIMAL(5, 2) • Dates and times – Exact time: DATE and TIME – Durations of time: INTERVAL CMPT 354: Database I -- Storing Data 9

  10. Table and Data • Table itself is not user data! – Is table data? • Table is a container of data • Data is represented as tuples in tables – Tuple (‘www.sfu.ca’, ‘HTML’, 9/5/2006, 45) CMPT 354: Database I -- Storing Data 10

  11. Insert a Tuple into a Table INSERT INTO WebPage(URL, language, update_date, n_out_links) VALUES (‘SFU’, ‘HTML’, ’9/5/2007’, 45) • The attributes and the corresponding values must be in the same order CMPT 354: Database I -- Storing Data 11

  12. Storing Complex Data • Multimedia objects are often huge – Music, video, e-books, … • BLOB: binary large object block • Complex data can be stored as BLOB’s – Applications are in charge of handling and explanation • Databases only store and retrieve BLOB’s as whole CMPT 354: Database I -- Storing Data 12

  13. Example • Extend the WebPage table to store web page content ALTER TABLE WebPage ADD content VARCHAR(MAX) CMPT 354: Database I -- Storing Data 13

  14. Storage in SQL Server – In Row Figure from http://www.microsoft.com/technet/prodtechnol/sql/2000/reskit/part3/c1161.mspx?mfr=true CMPT 354: Database I -- Storing Data 14

  15. Storage in SQL Server – Out Row Figure from http://www.microsoft.com/technet/prodtechnol/sql/2000/reskit/part3/c1161.mspx?mfr=true CMPT 354: Database I -- Storing Data 15

  16. Tips on Straightforward Design • In many cases, we can start with one table containing all attributes we want to store – Universal table as will be explained later • NULL value: value unknown, unavailable or non-applicable ALTER TABLE WebPage ADD title CHAR(256) INSERT INTO WebPage(URL, language, update_date, n_out_links, title) VALUES (‘www.my_page.net’, ‘HTML’, ‘9/5/2007’, 2, NULL) CMPT 354: Database I -- Storing Data 16

  17. Integrity Constraints – NOT NULL • Each web page must have a unique URL – NULL is not allowed in attribute URL CREATE TABLE WebPage ( URL CHAR(256) NOT NULL, language CHAR(40), update_date DATETIME NOT NULL, n_out_links INT NOT NULL ) CMPT 354: Database I -- Storing Data 17

  18. Integrity Constraints – Primary Key • Each web page must have a unique URL, an update time, and the number of out links – The language of an empty page can be undetermined CREATE TABLE WebPage ( URL CHAR(256) NOT NULL, language CHAR(40), update_date DATETIME NOT NULL, n_out_links INT NOT NULL, primary key (URL) ) CMPT 354: Database I -- Storing Data 18

  19. Data Integrity and Constraints • Integrity constraints: ensure accuracy and consistency of data in a database • Referential integrity: the properties that should be satisfied when data entries are referenced – Not NULL – Primary key – More will be discussed later CMPT 354: Database I -- Storing Data 19

  20. Schema-First or Data-First • Schema first approaches – Design a schema, collect data according to the schema – Traditional, well controlled applications: student information systems, library loan systems • Data first approaches – Data exists before a database is designed, or evolving data – More and more large applications: integrating data sources on the web (e.g., a database of blogs) CMPT 354: Database I -- Storing Data 20

  21. Summary • Creating tables in relational databases • Inserting tuples into a table • Specifying integrity constraints CMPT 354: Database I -- Storing Data 21

  22. To-Do List • Check the data types in SQL Sever – Some data types in standard SQL may not be available in SQL Server. Please find their correspondences • Create a table for web pages in SQL Server – Record URL, language, update time, number of outgoing links, and title – Use VARCHAR to store content CMPT 354: Database I -- Storing Data 22

  23. Advanced To-Do List • How can you extend the web page table to store the outgoing links? – Each outgoing link is a URL • SQL Server 2005 comes with a database AdventureWorks. Try to understand what data is stored in the database CMPT 354: Database I -- Storing Data 23

Recommend


More recommend