html
play

HTML Simple markup language Text is annotated with language - PDF document

HTML Simple markup language Text is annotated with language commands Internet Databases called tags, usually consisting of a start tag and an end tag Chapter 22 Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes


  1. HTML ❖ Simple markup language ❖ Text is annotated with language commands Internet Databases called tags, usually consisting of a start tag and an end tag Chapter 22 Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 1 2 HTML Example: Book Listing Web Pages with Database Contents <HTML><BODY> ❖ Web pages contain the results of database Fiction: queries. How do we generate such pages? <UL><LI>Author: Milan Kundera</LI? – Web server creates a new process for a program <LI>Title: Identity</LI> interacts with the database. <LI>Published: 1998</LI> – Web server communicates with this program via </UL> CGI (Common gateway interface) Science: – Program generates result page with content from <UL><LI>Author: Richard Feynman</LI> the database <LI>Title: The Character of Physical Law</LI> – Other protocols: ISAPI (Microsoft Internet Server API), NSAPI (Netscape Server API) <LI>Hardcover</LI> </UL></BODY></HTML> Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 3 4 Application Servers Other Server-Side Processing ❖ In CGI, each page request results in the creation of a ❖ Java Servlets: Java programs that run on the new process: very inefficient server and interact with the server through a ❖ Application server: Piece of software between the well-defined API. web server and the applications ❖ JavaBeans: Reusable software components ❖ Functionality: written in Java. – Hold a set of pre-forked threads or processes for performance ❖ Java Server Pages and Active Server Pages: – Database connection pooling (reuse a set of existing connections) Code inside a web page that is interpreted by – Integration of heterogeneous data sources the web server – Transaction management involving several data sources – Session management Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 5 6 1

  2. Beyond HTML: XML XML: Language Constructs ❖ Elements ❖ Extensible Markup Language (XML): “Extensible HTML” – Main structural building blocks of XML – Start and end tag ❖ Confluence of SGML and HTML: The power – Must be properly nested of SGML with the simplicity of HTML ❖ Element can have attributes that provide ❖ Allows definition of new markup languages, additional information about the element called document type declarations (DTDs) ❖ Entities: like macros, represent common text. ❖ Comments ❖ Document type declarations (DTDs) Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 7 8 Booklist Example in XML XML: DTDs <?XML version=“1.0” standalone=“yes”?> ❖ A DTD is a set of rules that defines the <!DOCTYPE BOOKLIST SYSTEM “booklist.dtd”> elements, attributes, and entities that are <BOOKLIST> <BOOK genre=“Fiction”> allowed in the document. <AUTHOR> ❖ An XML document is well-formed if it does <FIRST>Milan</FIRST><LAST>Kundera</LAST> not have an associated DTD but it is properly </AUTHOR> <TITLE>Identity</TITLE> nested. <PUBLISHED>1998</PUBLISHED> ❖ An XML document is valid if it has a DTD <BOOK genre=“Science” format=“Hardcover”> <AUTHOR> and the document follows the rules in the <FIRST>Richard</FIRST><LAST>Feynman</LAST> DTD. </AUTHOR> <TITLE>The Character of Physical Law</TITLE> </BOOK></BOOKLIST> Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 9 10 An Example DTD Domain-Specific DTDs ❖ Development of standardized DTDs for specialized <!DOCTYPE BOOKLIST [ domains enables data exchange between <!ELEMENT BOOKLIST (BOOK)*> heterogeneous sources <!ELEMENT BOOK (AUTHOR, TITLE, PUBLISHED?)> <!ELEMENT AUTHOR (FIRST, LAST)> ❖ Example: Mathematical Markup Language <!ELEMENT FIRST (#PCDATA)> (MathML) <!ELEMENT LAST (#PCDATA)> – Encodes mathematical material on the web <!ELEMENT TITLE (#PCDATA)> – In HTML: <IMG SRC=“xysq.gif” ALT=“(x+y)^2”> <!ELEMENT PUBLISHED (#PCDATA)> – In MathML: <!ATTLIST BOOK genre (Science|Fiction) #REQUIRED> <apply> <power/> <apply> <plus/> <ci>x</ci> <ci>y</ci> </apply> <!ATTLIST BOOK format (Paperback|Hardcover) “Paperback”> <cn>2</cn> ]> </apply> Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 11 12 2

  3. XML-QL: Querying XML Data XML-QL (Contd.) A more complicated example: ❖ Goal: High-level, declarative language that allows manipulation of XML documents WHERE <BOOK> $b <BOOK> IN “www.booklist.com/books.xml”, ❖ No standard yet <AUTHOR> $n </AUTHOR> ❖ Example query in XML-QL: <PUBLISHED> $p </PUBLISHED> in $e WHERE CONSTRUCT <BOOK> <RESULT> <PUBLISHED> $p </PUBLISHED> <NAME><LAST>$1</LAST></NAME> WHERE <LAST> $l </LAST> IN $n </BOOK> in “www.booklist.com/books.xml CONSTRUCT <LAST> $l </LAST> CONSTRUCT <RESULT> $1 </RESULT> </RESULT> Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 13 14 Semi-structured Data Example: Booklist Data in OEM ❖ Data with partial structure BOOK ❖ All data models for semi-structured data use some type of labeled graph AUTHOR TITLE PUBLISHED AUTHOR FORMAT ❖ We introduce the object exchange model TITLE (OEM): – Object is triple (label, type, value) The Hard- Identity 1998 character cover – Complex objects are decomposed hierarchically into smaller objects of phy- Milan Kundera sical law Richard Feynman Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 15 16 Indexing for Text Search Inverted Files RID Document ❖ Text database: Collection of text documents ❖ For each possible query 1 Agent James term, store an ordered ❖ Important class of queries: Keyword searches 2 Mobile agent list (the inverted list) of – Boolean queries: Query terms connected with document identifiers AND, OR and NOT. Result is list of documents that contain the term. that satisfy the boolean expression. Word Inverted List ❖ Query evaluation: – Ranked queries: Result is list of documents ranked Intersection or Union of Agent <1,2> by their “relevance”. inverted lists. – IR: Precision (percentage of retrieved documents James <1> that are relevant) and recall (percentage of ❖ Example: Agent AND relevant objects that are retrieved) James Mobile <2> Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 17 18 3

  4. Signature Files Signature Files: Query Evaluation ❖ Index structure (the signature file) with one ❖ Boolean query consisting of conjunction of words: – Generate query signature Sq data entry for each document – Scan signatures of all documents. ❖ Hash function hashes words to bit-vector. – If signature S matches Sq, then retrieve document and check for false positives. ❖ Data entry for a document (the signature of ❖ Boolean query consisting of disjunction of k words: the document) is the OR of all hashed words. – Generate k query signatures S1, …, Sk ❖ Signature S1 matches signature S2 if – Scan signature file to find documents whose signature S2&S1=S2 matches any of S1, …, Sk – Check for false positives Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 19 20 Signature Files: Example Summary Word Hash ❖ Publishing databases on the web requires server-side Agent 1010 processing such as CGI-scripts, Servlets, ASP, or JSP ❖ XML is an emerging document description standard James 1100 that allows the definition of new DTDs. Query languages for XML documents such as XQL are Mobile 0001 emerging. ❖ Text databases have gained importance with the RID Document Signature proliferation of text data on the web. Boolean queries can be efficiently evaluated using an inverted index 1 Agent James 1110 or a signature file. Evaluation of ranked queries is a 2 Mobile agent 1011 more difficult problem. Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke Database Management Systems, 2 nd Edition. R. Ramakrishnan and Johannes Gehrke 21 22 4

Recommend


More recommend