outline
play

Outline Introduction: storing and accessing data CUGS Core - - PDF document

Outline Introduction: storing and accessing data CUGS Core - Databases Semi-structured data Information integration Object-oriented and object-relational Patrick Lambrix databases Linkpings universitet 1 2 Work method


  1. Outline • Introduction: storing and accessing data CUGS Core - Databases • Semi-structured data • Information integration • Object-oriented and object-relational Patrick Lambrix databases Linköpings universitet 1 2 Work method Requirements For each topic: • Responsible for a topic (presentation + • introductory presentation by topic questions) (ca 60 hours) responsible • Participation in smaller discussion groups • in smaller groups: reading papers, • Take-home exam (ca 40 hours) discussion guided by predefined questions, summary • each smaller group presents their summary, final discussion moderated by topic responsible 3 4 Databanks/Databases Databank • One of many ways to store data in • DataBank Management System (DBMS): a electronic form collection of programs that allows a user to create and maintain a databank • used in every-day life: bank, reservation of hotel or travel, library search, bar codes • new applications : multimedia databases, • databank system = physical databank + geografic information systems, real-time DBMS databases 5 6

  2. Issues Databanks Real life • What information is stored? Queries/ information Answers updates • How is the information stored? Model (high and low level) • How is the information accessed? Databank Processing of Databank (user level, system level) queries and updates Management System • How is a databank recovered after a crash? Access to stored data Physical databank 7 8 Issues Persons • How to keep track of changes of the data • databank administrator over time? • databank designer • ’end user’ • How can several users access and update • application programmer information in a databank at the same time? • DBMS designer • How can a user access information in several • developer of tools databanks at the same time? • operator, maintenance 9 10 DEFINITION Homo sapiens adrenergic, beta-1-, receptor What information is stored? ACCESSION NM_000684 SOURCE ORGANISM human REFERENCE 1 • Model of reality AUTHORS Frielle, Collins, Daniel, Caron, Lefkowitz, - Entity-Relationship model (ER) Kobilka TITLE Cloning of the cDNA for the human - Unified Modeling Language (UML) beta 1-adrenergic receptor REFERENCE 2 AUTHORS Frielle, Kobilka, Lefkowitz, Caron TITLE Human beta 1- and beta 2-adrenergic receptors: structurally and functionally related receptors derived from distinct genes 11 12

  3. Entity-relationship Entity-Relationship protein-id source PROTEIN • entities and attributes accession definition m • entity types • key attributes Reference • relations • cardinality constraints n title article-id ARTICLE author 13 14 How is the information stored? Text - Information Retrieval (high level) How is the information accessed? • Search based on words (user level) • conceptual models: boolean, vector, probabilistic, … structure precision • Text (IR) • file models: • Semi-structured data flat files, inverted files, ... • Data models (DB) • Rules + Facts (KB) 15 16 IR – File model: inverted file Vector model (simplified) inverted file postings file document file Doc1 (1,1,0) DOC # WORD HITS LINK LINK DOCUMENTS cloning Doc2 (0,1,0) … … … … … Doc1 Q (1,1,1) adrenergic 32 1 … … … 5 … … Doc2 cloning 53 adrenergic 1 … … … 2 receptor 22 … 5 … … … sim(d,q) = d . q … … |d| x |q| receptor 17 18

  4. Relational databases PROTEIN REFERENCE Databases PROTEIN-ID ACCESSION DEFINITION SOURCE PROTEIN-ID ARTICLE-ID 1 Homo sapiens human NM_000684 1 1 adrenergic, • Relational databases: 1 2 beta-1-, receptor - model: tables + relational algebra ARTICLE - query language (SQL) ARTICLE-ID AUTHOR TITLE • Object-oriented databases: 1 Frielle Cloning of the cDNA for the human …. - model: persistent objects, 1 Collins Cloning of the cDNA for the human …. Cloning of the cDNA for the human …. 1 Daniel messages, encapsulation, inheritance Cloning of the cDNA for the human …. 1 Caron Cloning of the cDNA for the human …. - query language (t.ex. OQL) 1 Lefkowitz Cloning of the cDNA for the human …. 1 Kobilka Human beta 1- and beta 2-adrenergic receptors 2 Frielle Human beta 1- and beta 2-adrenergic receptors 2 Kobilka Human beta 1- and beta 2-adrenergic receptors 2 Lefkowitz Human beta 1- and beta 2-adrenergic receptors 2 Caron 19 20 Relational databases PROTEIN REFERENCE SQL PROTEIN-ID ACCESSION DEFINITION SOURCE PROTEIN-ID ARTICLE-ID 1 Homo sapiens human NM_000684 1 1 adrenergic, 1 2 beta-1-, receptor select source from protein ARTICLE-AUTHOR ARTICLE-TITLE where accession = NM_000684; ARTICLE-ID AUTHOR ARTICLE-ID TITLE 1 Frielle PROTEIN Cloning of the cDNA for the human 1 1 Collins PROTEIN-ID ACCESSION DEFINITION SOURCE beta 1-adrenergic receptor 1 Daniel 1 Caron Human beta 1- and beta 2- 1 Homo sapiens human 2 NM_000684 1 Lefkowitz adrenergic receptors: structurally adrenergic, 1 Kobilka and functionally related beta-1-, receptor 2 Frielle receptors derived from distinct 2 Kobilka genes 2 Lefkowitz 2 Caron 21 22 SQL From relational to object model select title • CASE from protein, article-title, reference where protein.accession = NM_000684 • CAD REFERENCE and protein.protein-id PROTEIN-ID ARTICLE-ID • office automation = reference.protein-id 1 1 and reference.article-id 1 2 • multimedia applications = article-title.article-id; PROTEIN ARTICLE-TITLE PROTEIN-ID ACCESSION DEFINITION SOURCE ARTICLE-ID TITLE 1 Cloning of the … 1 Homo sapiens human NM_000684 adrenergic, 2 Human beta 1- … beta-1-, receptor 23 24

  5. Object-Oriented Databases Object (OODB) • World is modeled using objects. • An object has an object identifier (OID) that is not visible to the user. • An object has a state (value) and a behavior (operations). • OID cannot be changed. • Persistent objects - permanent storage • object versus value (sometimes transient objects are allowed) (a value has no OID) • object structure can be arbitrarily complex (atom, tuple, set, list, bag, array) 25 26 Example - object state Example - object state • o3(id3, tuple, • o1(id1, tuple, <title: `Cloning of …’, author: o5 >) <accession: NM_000684, • o4(id4, tuple, source : human, <title: `Human beta-1 …’, author: o6 >) definition: ’Homo sapiens adrenergic …’, • o5(id5, list, [Frielle, Collins, Daniel, Caron, reference: o2>) Lefkowitz, Kobilka]) • o2(id2, set, {o3,o4}) • o6(id6, list, [Frielle, Kobilka , Lefkowitz, Caron]) Remark: These examples do not use a standard syntax 27 28 ”Homo sapiens adrenergic, human NM_000684 beta-1-, receptor” Classes SOURCE ACCESSION DEFINITION define class protein type tuple ( accession: string; REFERENCE source : string; definition: string; reference: set(article); ); AUTHOR set operations AUTHOR TITLE TITLE create-protein(string,string,string,set(article)): protein; list list Frielle get-accession: string; ”Cloning of …” ”Human beta-1 …” Frielle get-source: string; Collins get-definition: string; Kobilka Daniel get-references: set(article); Caron add-reference(article): void; Lefkowitz end protein; Lefkowitz Caron 29 30 Kobilka

  6. Classes Example program program define class article variables: article1, article2, protein1; type tuple ( begin title: string; author: list(string); ); article1 := create-article(’Cloning….’, list(Frielle, Collins, operations Daniel, Caron, Lefkowitz, Kobilka)); create-article(string, list(string)): article; get-title string; protein1 := create-protein(NM_000684, human,’Homo get-authors: list(string); sapiens adrenergic …’, set(article1)); print-article-info string; article2 := create-article(’ Human beta-1….’, list(Frielle, end article; Kobilka , Lefkowitz, Caron]); protein1.add-reference(article2); end; 31 32 Operations Inheritance • encapsulation: operation = interface + body • journal-article subtype-of article: - interface: how is the operation called? journal-name journal-volume page-numbers What is the result of the operation? journal-article inherits all attributes and operations from > visible to user, used in programs article and has in addition also journal-name, journal- - body: how is the operation implemented? volume and page-numbers as attributes > invisible for user • human-protein subtype-of protein (source = ’human’) • program is based on message passing 33 34 Operator overloading Query language OQL • The same operator name can be used for • select … from … where different implementations select distinct … from … where • example: • iterator variables print-article-info for article prints information on title and • path expressions author. print-article-info for journal-article prints information on • struct title, author and also on the journal’s name, volume and page number.. 35 36

Recommend


More recommend