Benefits? The fact that an XSLT stylesheet is a well-formed XML document has a number of advantages, however: • Stylesheets can be used as the input or output of a transformation (this is surprisingly common in practice) • XSLT can be embedded in other XML-based languages, and can in turn have other XML- based languages embedded within it. For example, this enables XSLT to support embedded schemas (a schema embedded within a stylesheet) in a way that XQuery can not. Similarly, XSLT can be easily embedded in pipeline processing languages such as Orbeon's XPL. • Because XSLT is XML, rather than merely mimicking XML, the same parser technology can be reused, the whole range of XML techniques can be used when writing stylesheets (for example, use of external entities and CDATA sections) and there are no surprises in store for a user who knows the rules of XML. ... Another benefit I have seen from using XML syntax is that it makes the grammar of XSLT much more easily extensible than that of XQuery. Because it tries to make do without any reserved words, and because it mixes a number of different syntactic styles, the grammar of XQuery is a delicate creature. Adding new features like a full-text search capability requires very careful analysis to ensure that no grammatical ambiguities are introduced. By contrast, it's very easy to extend XSLT with new instructions or new attributes, without any risk of ambiguities or backwards incompatibilities. This means that it's quite possible for such extensions to be implemented by vendors (or even by third parties) as well as by the XSL Working Group itself. Comparing XSLT and XQuery 14 http://www.mscs.mu.edu/~praveen/Teaching/fa05/AdvDb/PaperTeams/XSLT2XQTeam/Comparing%20XSLT%20and%20XQuery.htm Monday, 29 October 2012
Syntax! (a fragment of XSLT) <xsl:template match="pig-rescue" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <html> Literal <head> Constructors <link rel="stylesheet" type="text/css" href="bdr.css" /> <title>The Pigs</title> (Data! XML!) </head> <body> <div align="center"> <h1>The Pigs At Belly Draggers Ranch</h1> </div> <ul> XSLT Syntax <xsl:apply-templates (Code! XML!) select="animal[position() mod $perPage = 1]" mode="indexList" /> </ul> </body> </html> XPath Syntax </xsl:template> (Code! Not XML!) 15 Example from: http://www.xml.com/lpt/a/1549 Monday, 29 October 2012
Syntax! (a fragment of XQuery) let $doc := fn:input ()/ pig-rescue return ( Literal <html> <head> Constructors <link rel = "stylesheet" type = "text/css" href = "bdr.css" /> (Data! XML!) <title>The Pigs</title> </head> <body> <div align = "center" > <h1>The Pigs At Belly Draggers Ranch</h1> </div> <ul> XQuery Syntax { (Code! Not XML!) local:make-name-list ( $doc / animal ) } </ul> XPath Syntax </body> (Code! Not XML!) </html> ) 16 Example from: http://www.xml.com/lpt/a/1549 Monday, 29 October 2012
Verbosity? (Simple examples) • Goes both ways, depending... <xsl:param name="perPage" select="'4'"/> declare variable $perPage as xs:integer := 4; <xsl:value-of select="$filename"/>#a<xsl:value-of select="$start+position()-1"/> { $filename }#a{$start + $pos - 1} 17 Monday, 29 October 2012
Is XSLT Schema Aware? • Information from a schema can be used both – statically: when the stylesheet is compiled, and – dynamically: during evaluation of the stylesheet to transform a source document. • In a stylesheet (e.g., in XPath expressions and patterns), we may refer to named types from a schema (e.g., Person from <xs:complexType name="Person"> ) • The conformance rules for XSLT 2.0 distinguish between a – basic XSLT processor and a – schema-aware XSLT processor – in <oXygen>, you have both • Helpful: http://www.ibm.com/developerworks/xml/library/x- schemaxslt.html 18 Monday, 29 October 2012
XSLT: stylesheet • a stylesheet describes/tells an XSLT processor how to transform a source result tree into a tree (or text) • via XML template rules which associate with patterns templates • which are then used by an XSLT processor as follows: match pattern instantiate against corresponding elements in template to create source tree parts of the result tree 19 Monday, 29 October 2012
Alternatively: <xsl:transform version="1.0” XSLT: stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"/ xmlns:mine=“...”> top-level-elements </xsl:transform> <xsl:stylesheet version="1.0” xmlns:xsl="http://www.w3.org/1999/XSL/Transform"/ xmlns:mine=“...” > top-level-elements </xsl:stylesheet> An xsl:stylesheet can have zero or more of each of the following elements in (almost) any order: xsl:import xsl:decimal-format xsl:include xsl:namespace-alias later and in xsl : strip-space xsl : attribute-set more detail xsl:preserve-space xsl:variable xsl:output xsl:param xsl:key xsl:template 20 Monday, 29 October 2012
XSLT elements: template rule • (most important element!) a template rule is of the form the pattern optional <xsl:template match=“ expression” name=“ qname” priority=“ number” mode=“ qname” > parameter-list the template template-def </xsl:template> • parameter-list is a list of zero or more xsl:param elements • as expression, an XPath location path can be used – with some restrictions,e.g., it must evaluate to a node set – for XSLT 1.0, use XPath 1.0, – for XSLT 2.0, use XPath 2.0, • template-def is an XML document that makes use of other XSLT elements – including instructions such as xsl:apply-templates or xsl:copy-of 21 Monday, 29 October 2012
XSLT elements: template rules <xsl:template match= expression name = qname priority = number mode = qname > parameter-list template-def </xsl:template> • Example: when applied to “<emph>important</emph>”, <fo:inline-sequence font-weight="bold"> <xsl:template match="emph"> yields important <fo:inline-sequence font-weight="bold"> </fo:inline-sequence> <xsl:apply-templates/> </fo:inline-sequence> </xsl:template> • careful: there – are various built-in template rules – is a default prioritisation on template rules – is the XSLT processor who fires the templates rules • we will see later what elements we can use in template-def 22 Monday, 29 October 2012
XSLT elements: processing model, sketched • an XSLT processor takes an XML document d with associated stylesheet s • processes the (XPath DM) tree (possibly PSVI if SA) corresponding to d • in a depth-first manner – thus we always have a context node • applies those template rules to the context node that – match the context node and – have highest priority • thereby generating the result tree according to the template rules • the easiest way to generate output is to use literal elements as the blue and green in the previous example: <xsl:template match="emph"> <fo:inline-sequence font-weight="bold"> <xsl:apply-templates/> </fo:inline-sequence> </xsl:template> 23 Monday, 29 October 2012
XSLT elements: processing model by example consider the following source tree: root <?xml version="1.0" encoding="UTF-8"?> <?xml .... ?> <?xml-stylesheet type="text/xsl" href="my.xsl"?> people <?xml .... ?> <people> <person age="41"> person person <name> � <first>Harry</first> � ... <last>Potter</last> name age=41 address </name> <address>4 Main Road </address> </person> first last 4 Main Road <person age="43"> <name> <first>Tony</first> � <last>Potter</last> Harry Potter </name> <address>4 Main Road </address> </person> 24 </people> Monday, 29 October 2012
XSLT elements: processing model by example consider this source tree with the following XSLT stylesheet: <?xml version="1.0" encoding="UTF-8"?> root <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <?xml .... ?> <xsl:output method="html"/> people </xsl:stylesheet> person person what does this seemingly empty name age=41 address (no template rules!) stylesheet produce? first last 4 Main Road Harry Potter 25 Monday, 29 October 2012
XSLT elements: processing model by example (tricky!) the previous stylesheet was only seemingly empty because XSLT processors employ built-in template rules : <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match="*|/"> (1) for all element <xsl:apply-templates/> & document nodes </xsl:template> (2) don’t do anything but <xsl:template match="text()|@*"> apply templates to all <xsl:value-of select="."/> child nodes </xsl:template> (3) for all text and attribute nodes <xsl:template match="processing-instruction()|comment()"/> </xsl:stylesheet> (4) return their value (5) ignore p-i & comments thus templates are applied to all nodes (element, root, text,..) except attribute and namespace nodes 26 Monday, 29 October 2012
XSLT elements: processing model by example this is the default for “apply-templates”, and Built-in template rules : node() matches all nodes except (b) <xsl:template match="*|/"> attribute nodes & root node <xsl:apply-templates select="node()"/> </xsl:template> if you want your stylesheet (node() matches any node other than an attribute node and the root node) to consider attribute nodes, you must overwrite this default, e.g. like this (1) <xsl:template match="*|/"> <xsl:apply-templates select="node()|@*"/> </xsl:template> If we use template rule (1), then it over-rides built-in (b), hence now rules are applied to all nodes (element, root, text,..) including attribute nodes but still except namespace nodes 27 Monday, 29 October 2012
XSLT elements: processing model by example what does this slightly more elaborate stylesheet yield? <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" root version="2.0"> <xsl:template match= "person"> <?xml .... ?> people <xsl:text> Person found! </xsl:text> </xsl:template> person person </xsl:stylesheet> name age=41 address first last 4 Main Road Note: <xsl:text> superfluous here, but helpful Harry Potter 28 Monday, 29 October 2012
XSLT elements: processing model by example we can make use “functions” to retrieve the “value” of a node: root <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" <?xml .... ?> version="2.0"> people <xsl:template match= "person"> Person found called: � person person <xsl:value-of select="name"/> </xsl:template> name age=41 address </xsl:stylesheet> first last 4 Main Road Harry Potter 29 Monday, 29 October 2012
XSLT elements: processing model by example we can conveniently copy a node and its complete sub-tree : root <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" <?xml .... ?> version="2.0"> people <xsl:template match= "people"> <family> person person <xsl:copy-of select="child::*"/> </family> name age=41 address </xsl:template> </xsl:stylesheet> first last 4 Main Road Harry Potter alternatively, I could have used <xsl:copy-of select=“*”/> <xsl:copy-of select=“person”/> 30 Monday, 29 October 2012
The identity transform • XSLT Stylesheet that outputs the original document – E.g., an identity function • f (x) = x <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet> http://en.wikipedia.org/wiki/Identity_transform#Using_XSLT 31 Monday, 29 October 2012
XSLT elements: processing model by example we can re-name elements and filter out data: <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> root <xsl:template match="person"> <myFriend> <xsl:apply-templates select="@*|*|text()"/> <?xml .... ?> people </myFriend> </xsl:template> person person <xsl:template match="@*|text()|*"> <xsl:copy> <xsl:apply-templates select="@*|text()|*"/> name age=41 address </xsl:copy> </xsl:template> first last 4 Main Road <xsl:template match="address"/> </xsl:stylesheet> Harry Potter 32 Monday, 29 October 2012
XSLT elements: processing model by example we can even apply several rules to the same elements using modes for rules: <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" root xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0"> <xsl:template match="/people"> <?xml .... ?> <html><body><ol> people <xsl:apply-templates select="person" mode="o"/> </ol> <xsl:apply-templates select="person" mode="f"/> person person </body></html> � </xsl:template> <xsl:template match="person" mode="o"> name age=41 address <li> <xsl:value-of select="name/first"/> <xsl:value-of select="name/last"/></li> </xsl:template> first last 4 Main Road <xsl:template match="person" mode="f"> <p> Last name: <xsl:value-of select="name/last"/> Age: <xsl:value-of select="age"/> </p> Harry Potter </xsl:template> </xsl:stylesheet> 33 Monday, 29 October 2012
XSLT instructions: value-of <xsl:value-of select= expression/ > • is one of the generating instructions provided by XLST • it returns, for the first node selected through expression, the string value that corresponds to that node, where the string value of – a text node is its text – an attribute node is its value – an element or root node is the concatenation of the string values of all its descendant’s text nodes • ...all this is a bit more tricky if you use SA XSLT – because then, we have more than “text” in text nodes, and need to take into types... 34 Monday, 29 October 2012
XSLT elements: generating instructions • literal result elements : a simple way to create new nodes, e.g., in <xsl:template match=”person"> <Employee> <xsl:apply-templates/> </Employee> </xsl:template> • <xsl:text> : to produce pure text (and invoke error if elements are produced), e.g., in <xsl:template match="person"> <xsl:text> Person found! </xsl:text> </xsl:template> • <xsl:element name=“ qname ”> : to create a new element called qname in the resulte tree, with content the child nodes of that instruction, e.g. in <xsl:template match="person"> <xsl:element name="Employee"> <xsl:apply-templates/> </xsl:element> </xsl:template> handy for producing elements with attributes and namespaces 35 Monday, 29 October 2012
XSLT elements: generating instructions • <xsl:attribute> to produce an attribute, e.g., in <xsl:template match=”person"> <xsl:element name="Employee"> <xsl:attribute name="alter"> <xsl:value-of select=”@age"/> </xsl:attribute> <xsl:apply-templates/> </xsl:element> </xsl:template> • (already seen) <xsl:value-of select= expression/ > returns, for each node selected through expression, the string values that corresponds to that node, where the string value of a – text node is its text – attribute node is its value – element or root node is the concatenation of the string values of all its descendent text nodes 36 Monday, 29 October 2012
XSLT elements: generating instructions • <xsl:copy-of select= expression > produces a node set selected through expression . It can be used to reuse fragments of the source document. Careful: – <xslt:value-of> converts fragments into a string before copying it into the result tree – <xslt:copy-of> copies the complete fragment based on the (required) select attribute, without first converting the fragment into a string – e.g., <xsl:template match="people"> <family><xsl:copy-of select="*"/></family> </xsl:template> • <xsl:copy use-attribute-sets=“..”> simply copies the current node and then applies the template (in case it contains a template as child nodes) – namespaces are included automatically in the copy – attributes are not automatically <xsl:template match="people"> included, they can be included <family> via the “use-attribute-set” attribute <xsl:for-each select="person"> • <xsl:number> can be used to increase <xsl:copy> running numbers -- beyond this class <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:for-each> </family> </xsl:template> 37 Monday, 29 October 2012
XSLT elements: More Control Structures • Conditionals (<http://www.w3.org/TR/xslt20/#conditionals>) – <xsl:if>, <xsl:choose> – “if” in XPath (2.0) expressions • Repetition (<http://www.w3.org/TR/xslt20/#for-each>) – <xsl:for-each> – “for” in XPath (2.0) expressions • Called templates (<http://www.w3.org/TR/xslt20/#named-templates>) – You can name templates and then call them by name • With parameters – Interrupt or restart the template control flow • Functions! (<http://www.w3.org/TR/xslt20/#stylesheet-functions>) – <xsl:function> defines – Use as XQuery functions in XPath expressions 38 Monday, 29 October 2012
XSLT Function Example <xsl:function name="str:reverse" as="xs:string"> <xsl:param name="sentence" as="xs:string"/> <xsl:sequence select=" if (contains($sentence, ' ')) then concat(str:reverse(substring-after($sentence, ' ')), ' ', substring-before($sentence, ' ')) else $sentence"/> </xsl:function> <xsl:template match="/"> <output> <xsl:value-of select=" str:reverse ('DOG BITES MAN')"/> </output> </xsl:template> http://www.w3.org/TR/xslt20/#stylesheet-functions 39 Monday, 29 October 2012
XSLT Function Example <xsl:function name="str:reverse" as="xs:string"> <xsl:param name="sentence" as="xs:string"/> <xsl:choose> <xsl:when test="contains($sentence, ' ')"> <xsl:sequence select="concat(str:reverse(substring-after($sentence, ' ')), ' ', substring-before($sentence, ' '))"/> </xsl:when> <xsl:otherwise> <xsl:sequence select="$sentence"/> </xsl:otherwise> </xsl:choose> </xsl:function> http://www.w3.org/TR/xslt20/#stylesheet-functions 40 Monday, 29 October 2012
(XQuery version) declare namespace str ="http://ex.org"; declare namespace xs ="http://www.w3.org/2001/XMLSchema"; declare function str:reverse ( $sentence as xs:string ) as xs:string { if ( contains ( $sentence , ' ')) then concat ( str:reverse ( substring-after ( $sentence , ' ')), ' ', substring-before ( $sentence , ' ')) else $sentence }; str:reverse ('DOG BITES MAN') 41 Monday, 29 October 2012
Functions Compared <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:str="http://example.com/namespace" version="2.0" exclude-result-prefixes="str"> <xsl:function name="str:reverse" as="xs:string"> <xsl:param name="sentence" as="xs:string"/> <xsl:sequence select=" if (contains($sentence, ' ')) then concat(str:reverse(substring-after($sentence, ' ')), ' ', declare namespace str ="http://ex.org"; substring-before($sentence, ' ')) declare namespace xs ="http://www.w3.org/2001/XMLSchema"; else $sentence"/> </xsl:function> declare function str:reverse ( $sentence as xs:string ) as xs:string { if ( contains ( $sentence , ' ')) <xsl:template match="/"> then concat ( str:reverse ( substring-after ( $sentence , ' ')), <output> ' ', <xsl:value-of substring-before ( $sentence , ' ')) select="str:reverse('DOG BITES MAN')"/> else $sentence </output> }; </xsl:template> str:reverse ('DOG BITES MAN') </xsl:transform> 42 Monday, 29 October 2012
Verbosity encore: Identity Transform <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet> declare function local:copy ( $element as element ()) { element { node-name ( $element )} { $element / @* , for $child in $element / node () return if ( $child instance of element ()) then local:copy ( $child ) else $child } Explicit recursion! }; http://en.wikipedia.org/wiki/Identity_transform#Using_XSLT 43 Monday, 29 October 2012
Verbosity encore: Identity Transform <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <!-- remove all social security numbers --> <xsl:template match="PersonSSNID"/> declare function local:copy-filter-elements ( $element as element (), $element-name as xs:string *) as element () { element { node-name ( $element ) } { $element / @* , for $child in $element / node ()[ not ( name (.)= $element-name )] return if ( $child instance of element ()) then local:copy-filter-elements ( $child , $element-name ) else $child Eek! } }; http://en.wikipedia.org/wiki/Identity_transform#Using_XSLT 44 Monday, 29 October 2012
XSLT ⇔ XQuery • Both Turing Complete – Share XPath – Share ton of functions – Theoretically, equivalent! • XQuery to XSLT – Have to cope with FLOWR Kay, Comparing XSLT and XQuery • Use control structures! • XSLT to XQuery – Have to encode template rules! • Lots of work! • See http://www.mscs.mu.edu/~praveen/Teaching/fa05/AdvDb/PaperTeams/ XSLT2XQTeam/Presentation.ppt for one approach 45 Monday, 29 October 2012
XSLT … • many more things are provided by XSLT, • you are cordially invited to – find more about them – experiment with schema awareness • see nice features and complications – experiment with namespaces – (and with SA and namespaces) – get your own experiences using <oXygen/> – have a look, e.g., at the influence of template rules’ order to the result! – think about how one compare XSLT and XQuery • their (dis)advantages • when would you use/recommend which? • do we need both? 46 Monday, 29 October 2012
Bit more tree grammar! 47 Monday, 29 October 2012
Last week... ...we have designed our first “schema validator” algorithm “yes”, if T ∈ L(G) Tree T ValAlgo Grammar G “no”, otherwise • for local tree grammars first • that can be implemented by – walking the DOM tree in a depth-first, left-2-right way, or – using a SAX parser to do it in a streaming fashion • thus uses memory space linear in the depth of the input tree • that uses stacks local ⇒ unique! – to keep track of • each rule that a node’s validation needs to check against: R written on the way down, checked on the way up • result of child nodes validations: which non-terminal symbols did they validate with? 48 Monday, 29 October 2012
“yes”, if T ∈ L(G) This week... Tree T ValAlgo Grammar G “no”, otherwise ...we expand the algorithm • first to single-type – this gives us automatically a validator for structural aspect of WXS – will be rather straightforward • then to general tree grammars – this gives us automatically a validator for Relax NG schemas – will be more tricky: we’ll still use stacks to keep track of • all rule that a node’s validation needs to check against: R written on the way down, checked on the way up • result of child nodes validations: which non-terminal symbols did they validate with? 49 Monday, 29 October 2012
“yes”, if T ∈ L(G) This week... Tree T ValAlgo Grammar G “no”, otherwise • All three algorithms can be implemented by – walking the DOM tree in a depth-first, left-2-right way, or – using a SAX parser to do it in a streaming fashion • thus use memory space linear in the depth of the input tree – which is quite impressive/surprising for general/Relax NG: ✓ we have already seen that • DTDs ⤳ local grammars Loc Reg ST • WXS ⤳ single-type grammars • RelaxNG ⤳ regular grammars ➡ DTDs are structurally weaker than WXS ➡ WXS is structurally weaker than RelaxNG 50 Monday, 29 October 2012
..so why restrict to single-type? • ...if validation of more expressive schema languages is equally cheap?! • Single-type-ness of schema language ensure uniqueness of PSVI – because each accepted tree only has a unique run – thus answers to schema-aware queries are unambigiously determined (1 doc validating against 1 schema results in exactly 1 PSVI) Schema-aware query processor Query Query Schema- Query XML doc. Answer aware processor PSVI parser Schema (tree adorned with default values & types) 51 Monday, 29 October 2012
..so why restrict to single-type? • ...if validation of more expressive schema languages is equally cheap?! • Single-type-ness of schema language ensure uniqueness of PSVI – because each accepted tree only has a unique run – thus answers to schema-aware queries are unambigiously determined Remember: G is not single-type G= (N, Σ , S, P) and has 2 runs on ☞ ε a with namely: N = {S1, S2, B} 0 b Σ = {a,b} S2 ε a S1 ε a S = {S1, S2} P = {S1 ⟶ a B, B B 0 b S2 ⟶ a B, 0 b B ⟶ b ∊ } ...what does query for nodes/elemnts of ‘type’ S1 return? 52 Monday, 29 October 2012
how to validate documents against schemas (2) and (3) 53 Monday, 29 October 2012
See the paper by Murata, Lee, Mani, Kawaguchi “yes”, if T ∈ L(G) XML doc/Tree T ValAlgo single-type Grammar G “no”, otherwise Input : DOM Tree for T, single-type tree grammar G = (N, Σ , S, P), nothing NT is a stack of strings of non-terminals changed R is a stack of production rules single-type Traverse T in a depth-first, left-2-to-right manner ⇒ unique rule! When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name and (E is root and N in S or N occurs in RHS of topmost rule in R) then push N → a e onto R and push ϵ onto NT store rule for E’s content in R else report “not accepted” and stop start remembering E’s child nodes When an element E is visited on way up, retrieve rule for E’s content in R pop a rule N → a e out of R retrieve E’s child nodes pop a string of non-terminals w out of NT if w matches e then pop a string w’ of non-terminals out of NT and push w’N onto NT add E’s terminal node to its else report “not accepted” and stop predecessor siblings report “accepted” and stop 54 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo single-type Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({S,B,C},{a,b,c},{S},P) with P = { S → a B,B*,D B → b (C,C)|C, b b c C → c ϵ |C, D → c C,C,C} c c c c c c – ...in order to know which production rule N → c ... to chose for nodes labelled c, I need to check rule for predecessor and ensure that N occurs in RHS chosen for them... When an element E is visited on way down , if there is a production rule N → a e in P with a = E’s tag name and (E is root and N in S or N occurs in RHS of topmost rule in R) then push N → a e onto R and push ϵ onto NT else report “not accepted” and stop 55 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • want to implement this algorithm? Again, as for local tree grammars, – walk the DOM tree in a depth-first, left-2-right way, or – use a SAX parser and do it in a streaming fashion • now, this was for single-type tree grammars, let’s see how this works for general tree grammars – we can have competing non-terminal symbols in RHS of rules – how do we know with which to continue? – try/guess one and, if failed, backtrack? – or by keeping track of all possibilities • and, as long as we have some , everything is fine.. • which means we need some more stacks for track keeping... 56 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise Input: DOM Tree for T, a tree grammar G = (N, Σ , S, P), NT is a stack of strings of sets of non-terminals R is a stack of sets of production rules store non-terminals from RHS of possibly applicable rules NS is a stack of sets of non-terminals, init with S we don’t Traverse T in a depth-first, left-2-to-right manner know which to When an element E is visited on way down , use! set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS onto NS else report “not accepted” and stop When an element E is visited on way up, 57 pop a rule set RS = { N i → a e i | i = 1..k} out of R Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise Input: DOM Tree for T, a tree grammar G = (N, Σ , S, P), NT is a stack of strings of sets of non-terminals R is a stack of sets of production rules store non-terminals from RHS of NS is a stack of sets of non-terminals, init with S possibly applicable rules Traverse T in a depth-first, left-2-to-right manner When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with each w j from W j that matches e i if W is non-empty then pop a string V 1 ...V m of non-terminals out of NT push V 1 ...V m W onto NT, pop NS else report “not accepted” and stop report “accepted” and stop 58 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C), ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with each w j from W j that matches e i if W is non-empty then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT else report “not accepted” and stop NS report “accepted” and stop 59 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty RS = { ➀ , ➁ , ➂ } then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 60 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty RS = { ➀ , ➁ , ➂ } then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 61 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty RS = { ➀ , ➁ , ➂ } then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,B,C} ϵ pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 62 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) if RS is non-empty RS = { ➀ , ➁ , ➂ } then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, { ➀ , ➁ , ➂ } ϵ {A,B,C} pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } ϵ {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 63 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name W = {A,C} and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, ( ϵ only matches RHS of ➀ & ➂ ) push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, {A,B,C} pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } ϵ {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 64 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name W = {A,C} and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, ( ϵ only matches RHS of ➀ & ➂ ) push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 65 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) RS = { ➀ , ➁ , ➂ } if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, { ➀ , ➁ , ➂ } ϵ {A,B,C} pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 66 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name W = {A,C} and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, ( ϵ only matches RHS of ➀ & ➂ ) push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, {A,B,C} pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ {A,B,C} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 67 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name W = {A,C} and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, ( ϵ only matches RHS of ➀ & ➂ ) push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C},{A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 68 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) RS = { ➀ , ➁ , ➂ } if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, { ➀ , ➁ , ➂ } ϵ {A,B,C} pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C},{A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 69 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name W = {A,C} and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, ( ϵ only matches RHS of ➀ & ➂ ) push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, {A,B,C} pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C},{A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 70 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name W = {A,C} and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, ( ϵ only matches RHS of ➀ & ➂ ) push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } ϵ = W 1 ...W k in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } {A,C},{A,C},{A,C} {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 71 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) W = {C} (LHS of ➂ ) if RS is non-empty then push RS onto R, (only AAA matches a RHS, namely of ➂ ) push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } {A,C},{A,C},{A,C} = W 1 ...W 3 in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } ϵ {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 72 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) W = {C} (LHS of ➂ ) if RS is non-empty then push RS onto R, (only AAA matches a RHS, namely of ➂ ) push ϵ onto NT, push set of all non-terminals occurring RS = { ➀ , ➁ , ➂ } {A,C},{A,C},{A,C} = W 1 ...W 3 in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } {C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 73 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) RS = { ➀ , ➁ , ➂ } if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R { ➀ , ➁ , ➂ } ϵ {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } {C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 74 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name W = {A,C} and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, ( ϵ only matches RHS of ➀ & ➂ ) push ϵ onto NT, push set of all non-terminals occurring ϵ = W 1 ...W k RS = { ➀ , ➁ , ➂ } in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R {A,B,C} pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } {C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 75 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name W = {A,C} and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, ( ϵ only matches RHS of ➀ & ➂ ) push ϵ onto NT, push set of all non-terminals occurring ϵ = W 1 ...W k RS = { ➀ , ➁ , ➂ } in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with { ➀ , ➁ , ➂ } {C},{A,C} {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 76 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name W = {B} and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, (only CC matches RHS of ➁ ) push ϵ onto NT, push set of all non-terminals occurring {C},{A,C} = W 1 ...W k RS = { ➀ , ➁ , ➂ } in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with {A,B,C} each w j from W j that matches e i { ➀ , ➁ , ➂ } ϵ if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 77 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name W = {B} and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, (only CC matches RHS of ➁ ) push ϵ onto NT, push set of all non-terminals occurring {C},{A,C} = W 1 ...W k RS = { ➀ , ➁ , ➂ } in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with each w j from W j that matches e i { ➀ , ➁ , ➂ } {B} if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 78 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name W = {A,B} and (N occurs in topmost set of NS) if RS is non-empty then push RS onto R, (B matches RHS of ➀ & ➁ ) push ϵ onto NT, push set of all non-terminals occurring {B} = W 1 ...W k RS = { ➀ , ➁ , ➂ } in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with each w j from W j that matches e i if W is non-empty {A,B,C} then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 79 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Let’s see how algorithm works: a – G = ({A,B,C},{a},{A,B,C},P) with ➀ P = { A → a B| ϵ , a B → a (C,C) | B, ➁ a a C → a (A,A,A)| ϵ } ➂ a a a When an element E is visited on way down , set RS to the set of production rules N → a e in P with a = E’s tag name and (N occurs in topmost set of NS) “accepted”/“yes”, T is accepted by G if RS is non-empty then push RS onto R, push ϵ onto NT, push set of all non-terminals occurring in RHS of a rule in RS to NS else report “not accepted” and stop When an element E is visited on way up, pop a rule set RS = { N i → a e i | i = 1..k} out of R pop a string of sets of non-terminals W 1 ...W k out of NT set W to the set of those N i such that there is a w 1 ...w k with each w j from W j that matches e i if W is non-empty then pop a string V 1 ...V m of non-terminals out of NT {A,B,C} push V 1 ...V m W onto NT, pop NS R NT NS else report “not accepted” and stop NS report “accepted” and stop 80 Monday, 29 October 2012
“yes”, if T ∈ L(G) XML doc/Tree T ValAlgo any tree Grammar G “no”, otherwise • Implementing this algorithm? Again, as for single-type tree grammars, – walk the DOM tree in a depth-first, left-2-right way, or – use a SAX parser and do it in a streaming fashion • Insights gained? Validating general tree grammars – does not require guessing & backtrack – can be implemented in a streaming way – is a bit more tricky than validating single-type grammars, – but not really more complex (in terms of time/space) • still only space linear in depth of input tree – so, for validating purposes, restrictions to single-type is not necessary • feel free to describe structure in a powerful way! – but, for uniqueness of PSVI, • we need single-type 81 Monday, 29 October 2012
Schema languages for different purposes... • testing/describing structural constraints – do persons’ names have both a first and second name? • testing type constraints – is age an integer? And DoB a date? • describing a handy PSVI – adding default values or type information for easy/robust querying/manipulation • … • single-typedness useful for some, but not all purposes! • locality? • Your applications might use different schemas for different purposes • ...and there are purposes none of our schema languages can serve: – in CW5, not all valid input documents were really grammars – checking whether non-terminals are mentioned correctly is beyond XSD’s abilities...we need an even more powerful schema language! 82 Monday, 29 October 2012
Other interesting questions ...closely related to validation are • Schema emptiness: – given a schema/grammar S, does there exist a document/tree d such that d is valid w.r.t. S – relevant as a basic consistency test for schemas • Schema containment: – given schemas/grammars S1, S2, is S1 a specialization of S2? – i.e., is every document that is valid w.r.t. S1 also valid w.r.t. S2? – relevant to support tasks such as schema refinement: • if I say I want to refine S2, • then it would be nice if this intention could be later verified to ensure that I did what I wanted – also solves schema equivalence: see your coursework! • ...a lot of research in both areas 83 Monday, 29 October 2012
Bye for now! (I’ll be around) I have enjoyed working with you, and hope you learned loads and also enjoyed the experience! 84 Monday, 29 October 2012
The Essence of Error Or, so wrong it’s right 85 Monday, 29 October 2012
How to cope? • With which task? – Authoring, aggregating, querying … • Settle on a core representation of the model – Perhaps the Atom DOM • Coerce/transform/extract other models – To the representative one – Or build software that mediates the difference • Hope that there aren’t too many • Advocate standards! – Or make them – The nice thing about standards is that there are so many of them to choose from. • Kent Pitman and others Monday, 29 October 2012
Postel’s Law Be liberal in what you accept, and conservative in what you send. • Liberality – Many DOMs, all expressing the same thing – Many surface syntaxes (perhaps) for each DOM • Conservativity – What should we send? • It depends on the receiver! – Minimal standards? • Well formed XML? • Valid according to a popular schema/format? • HTML? Monday, 29 October 2012
Structure and Presentation • We’ve called this “DOM” and “Application” Layer – A very common application layer is “rendering” • Text, images • Like, y’know, the web • Standard vs. default renderings • Goes back to SGML <sentence style="slanted">This sentence is false.</sentence> Correct rendering This sentence is false. Fallback! This sentence is false. (Still see this in XSLT!) 88 Monday, 29 October 2012
Why Separate them? • Presentation is more fluid than structure – The "look" may need updating • Presentation needs may vary – What works for 21" screens doesn't for mobile phones • (Or maybe not!) • Accessibility – (content should be perceivable by everyone) • Programmatic processing needs 89 Monday, 29 October 2012
Another digression: CSS • The style language for the Web – Strong separation of presentation • CSS is – not an XML/angle brackets format • Oh NOES! Not another one! – annotative, not transformative • Well, sorta – mostly “formats” nodes – ubiquitous on the Web, esp. client side – works with arbitrary XML • But most clients work with (X)HTML • See the excellent PrinceXML formatter 90 Monday, 29 October 2012
Basic Component • Rules – Which consist of • Selectors – Like XPath expressions – But only forward, with some syntactic sugar • Declaration blocks – Sets of property/value pairs div.title { text-align:center; font-size: 24; } 91 Monday, 29 October 2012
<html><head><title>A bit of style</title></head> <body><style type="text/css"> .title { font-weight: bold } div.title { text-align:center; font-size: 24; } div.entry div.title { text-align: left; font-variant: normal} span.date {font-style: italic} span.date:after {content:" by"} div.content {font-style: italic} div.content i {font-style: normal;font-weight: bold} #one {color: red}</style> <div class="title">My Weblog</div> <div class="entry"> <div class="title">What I Did Today</div> <div class="byline"> <span class="date">Feb. 09, 2009</span> <span class="author">Bijan Parsia</span> </div> <div class="content" id="one"> <p>Taught a class and it went <i>very</i> well.</p> </div> </div> </body></html> Try it in http://software.hixie.ch/utilities/js/live-dom-viewer/ 92 Monday, 29 October 2012
Media Types • Different sets of rules can be contextualized to media – Screen, Print, Braille, Aural … • This is done with groupings called “@media rule”s @media print { BODY { font-size: 10pt } } Larger font size for screen @media screen { BODY { font-size: 12pt } } 93 Monday, 29 October 2012
Cascading • CSS Rules cascade – That is, there is overriding (and non-overriding) inheritance • That is, rules combine in different ways – http://www.w3.org/TR/CSS21/cascade.html#cascade • General principles – Distance to the node is significant – Precision of selectors is significant – Order of appearance is significant 94 Monday, 29 October 2012
Error Handling • XML has “draconian” error handling – Well formedness error … BOOM • CSS has “forgiving” error handling – “Rules for handling parsing errors” http://www.w3.org/TR/CSS21/syndata.html#parsing-errors • That is, how to interpret illegal documents • Not reporting errors, but working around them – E.g.,“User agents must ignore a declaration with an unknown property.” • Replace: “ h1 { color: red; rotation: 70minutes } ” • With: “ h1 { color: red } ” • Study the error handling rules! 95 Monday, 29 October 2012
CSS Robustness • Has to deal with Web conditions 1. People borrowing 2. People collaborating 3. Different devices 4. Different kinds of audiences (and authors) 5. Maintainability 6. Aesthetics • CSS is designed for this – Cascading & Inheritance help with 1, 2, 5 • And importing, of course – @media rules help with 3-6 – Error handling helps with 1, 2, 4 96 Monday, 29 October 2012
Errors! • One person’s error is another’s data • Errors may or may not be unusual • Errors are relative to a norm • Preventing errors – Make errors hard or impossible to make • Make doing things hard or impossible – Make doing the right thing easy and inevitable – Make detecting errors easy – Make correcting errors easy – Correct errors – Fail silently – Fail randomly – Fail differently (interop problem) 97 Monday, 29 October 2012
(Perceived) Affordances • (Perceived) Affordance – an available action that is salient to the actor Donald Norman, The Design of Everyday Things Monday, 29 October 2012
(Perceived) Affordances • (Perceived) Affordance – an available action that is salient to the actor Donald Norman, The Design of Everyday Things Monday, 29 October 2012
Attractive Nuisances • A dominant or attractive affordance – with a bad or wrong action – In law, “a hazardous object or condition on the land that is likely to attract children who are unable to appreciate the risk posed by the object or condition” -- ye olde Wikipedia – We can reformulate • “a hazardous or misleading language or UI feature that is likely to be misused by (even) an educated user” • Contrast with “merely” hard to use – An attractive nuisance is easy to attempt, hard to use (correctly), and has bad (to catastrophic) effects Monday, 29 October 2012
Recommend
More recommend