USC "Entrepreneurship" -- Information Modeling Instructor: Peter Baumann email: p.baumann@jacobs-university.de tel: -3178 office: room 60, Research 1 USC Entrepreneurship / Information Modeling (P. Baumann)
Roadmap ! Data Modeling ! Relational Databases ! OLAP & Data Warehousing USC Entrepreneurship / Information Modeling (P. Baumann) 2
Data Modeling ! So we have databases for maintaining & searching data ! ...but these need some structure • Ex: in a letter, distinguish name of sender / name of recipient ! What do we need to express? • "Sales rep X (phone Y, email Z) is contact person for customer C" • Hm... Objects would be fine to express, plus relationships between objects; and we need properties ! Entity-Relationship Model (ER-M) [Peter Chen, 1970s] • Entities, relationships, attributes USC Entrepreneurship / Information Modeling (P. Baumann) 3
Entity-Relationship Model: Basics ! Entity: Real-world object distinguishable from other objects 'John Doe' • Simple attribute values 5 314-159-XY (strings, numbers) 'Mia Mae' 'Mia Mae' 2 265-352-XY 'Mia Mae' 2 123-456-XY 'Mia Mae' 2 123-456-XY [John Doe] 2 123-456-XY [Mia Mae] [Mia Mae] ! Entity set: collection of similar entities [Mia Mae] [Mia Mae] • All entities in an entity set have same set of attributes name ssn lot • unique, identifying key Employees USC Entrepreneurship / Information Modeling (P. Baumann) 4
ER Model Basics (Contd.) ! Relationship: association among two or more entities • Ex: " Attishoo works_in Pharmacy_department" ! Relationship Set: Collection of similar relationships • each relationship instance in R involves entities e1 ∈ E1, ..., en ∈ En name ssn lot Employees dname name since ssn did budget lot super- subor- visor dinate Works_In Departments Employees Reports_To USC Entrepreneurship / Information Modeling (P. Baumann) 5
ISA (`is a’) Hierarchies ! A ISA B: every A entity is also a B entity ("A inherits from B") • A entities have attributes like B entities have, plus maybe more • A is called subclass, B superclass name ssn lot ! Purpose: information reuse (tidy up schema) Employees hours_worked ISA contractid hourly_wages Contract_Emps Hourly_Emps USC Entrepreneurship / Information Modeling (P. Baumann) 6
Summary: ER ! ER model popular for database design ! …but even better: UML (Unified Modeling Language, www.uml.org) • More powerful, more exact, standardized, many tools USC Entrepreneurship / Information Modeling (P. Baumann) 7
Roadmap ! Data Modeling ! Relational Databases ! OLAP & Data Warehousing USC Entrepreneurship / Information Modeling (P. Baumann) 8
Relational Databases ! database = set of tables ("relations") ! Ex: Students table: Students sid name login gpa tuple attribute USC Entrepreneurship / Information Modeling (P. Baumann) 9
Querying Relational Databases • SQL: „structured [english] query language“ [Codd 1970] • Can retrieve, insert, update, delete data • Result of SQL query is table again • standardised by ISO ! Queries can be written intuitively, DBMS responsible for efficient evaluation key: precise (mathematical) semantics • USC Entrepreneurship / Information Modeling (P. Baumann) 10
SQL @ Work ! "names of all students Students: with GPA less than 3.6" sid name login gpa ----------------------------- ! In SQL: 53666 Jones jones@cs 3.4 SELECT name 53688 Smith smith@eecs 3.2 FROM Students 53650 Smith smith@math 3.8 WHERE gpa < 3.6 name ----- Jones Smith USC Entrepreneurship / Information Modeling (P. Baumann) 11
Wrap-Up: Databases & SQL ! "relational databases" + SQL dominate market • more natural than earlier, procedural query languages • Although looking innocent, SQL has rigid mathematical semantics definition ! Don't worry – normally, IT specialists will phrase your queries • also can map ER diagrams to table schemas USC Entrepreneurship / Information Modeling (P. Baumann) 12
Roadmap ! Data Modeling ! Relational Databases ! OLAP & Data Warehousing USC Entrepreneurship / Information Modeling (P. Baumann) 13
The Big Picture: DBs in the Enterprise ! IT to help knowledge worker (executive, manager, analyst) to make faster & better decisions • Sales volumes by region and product category for the last year? • How did the share price of computer manufacturers correlate with quarterly profits over the past 10 years? • Which orders should we fill to maximize revenues? • Will a 10% discount increase sales volume sufficiently? • Which of two new medications will result in the best outcome: higher recovery rate & shorter hospital stay? ! → Decision Support Systems (DSS) • Data Warehousing, OLAP elements of DSS USC Entrepreneurship / Information Modeling (P. Baumann) 14
Data Warehouse ! Management problem: dozens to hundreds of disparate databases – how to keep overview? ! Approach: database which is… separate from organization’s operational databases • Primarily used for organizational decision support • ! Def: Data Warehouse = [W.H. Inmon, 1994] subject-oriented, integrated, time-varying, non-volatile • USC Entrepreneurship / Information Modeling (P. Baumann) 15
Data Warehouse: Role & Position Enterprise “Database” Customers Orders Transactions Vendors Etc… Data Miners: Etc… • “Farmers” – they know • “Explorers” - unpredictable Copied, Organized, summarized Data Data Mining Warehouse USC Entrepreneurship / Information Modeling (P. Baumann) 16
Data Warehouse: IT Perspective USC Entrepreneurship / Information Modeling (P. Baumann) 17
OLAP ! OLAP = "Online Analytical Processing" ! Goal: support ad-hoc querying for business analyst • familiar with spreadsheets ! → extend spreadsheet analysis model to work with warehouse data • Large data sets • Semantically enriched to understand business terms (e.g., time, geography) • Combined with reporting features ! Multidimensional view of data USC Entrepreneurship / Information Modeling (P. Baumann) 18
Multidimensional Data Model ! Data cube City NY = set of facts (points) in n-D space LA SF ! Space spanned by measures Juice 10 Product = set of dimension axes Cola 47 30 Milk • aka coordinate system 12 Cream ! dimension attributes 3/1 3/2 3/3 3/4 at different granularity Date • Hierarchy: street > county > city Sales volume • Lattice: date > month > year, as a function of date > week > year date, city, product USC Entrepreneurship / Information Modeling (P. Baumann) 19
Operations in n-D Data Model ! Aggregation (roll-up) • dimension reduction: e.g., total sales by city • summarization over aggregate hierarchy: e.g., total sales by city and year -> total sales by region and by year ! Navigation to detailed data (drill-down) • (sales - expense) by city • top 3% of cities by average income ! Selection (slice) defines a subcube • e.g., sales where city = Palo Alto and date = 1/15/96 ! Visualization Operations • e.g., Pivot USC Entrepreneurship / Information Modeling (P. Baumann) 20
Aggregation Data Warehousing, Decision Support & OLAP Readings in Database Systems, 3rd Edition Stonebraker & Hellerstein, eds. USC Entrepreneurship / Information Modeling (P. Baumann) 21
Visualization by Spreadsheet USC Entrepreneurship / Information Modeling (P. Baumann) 22
Visualization By Graphics USC Entrepreneurship / Information Modeling (P. Baumann) 23
Pivot Tables: Handling n-D Spreadsheets USC Entrepreneurship / Information Modeling (P. Baumann) 24
Data Warehousing / OLAP Market USC Entrepreneurship / Information Modeling (P. Baumann) 25
Wrap-Up: Biz Information Modeling ! Entity-Relationship Model, UML • Describe structure of a miniworld ("universe of discourse") • Easy mapping to... ! (Relational) databases • Tables, SQL ! Data Warehouse • heterogeneous sources → subject-focused agglomeration ! OLAP • Multi-dimensional cubes → Decision Support USC Entrepreneurship / Information Modeling (P. Baumann) 26
Recommend
More recommend