Consider the book dtd at https://www.w3.org/TR/xquery-use-cases/#xmp but replace
ID: 3824172 • Letter: C
Question
Consider the book dtd at https://www.w3.org/TR/xquery-use-cases/#xmp but replace <!ELEMENT book (title, (author+ | editor+ ), publisher, price )> by the simpler <!ELEMENT book (title, author+ , publisher, price )>
b) The algorithm we gave for creating a relational schema to store XML documents from a DTD only works if the there is no parentheses / nesting in element definitions. More complex formulas must be reduced to this form. For example <!ELEMENT A (B*)*> is obviously the same as <!ELEMENT A B*>. A slightly more complicated case is posed by the definition <!ELEMENT A (D* | E*) > (i) Give another element definition <!ELEMENT A _______ > (ii) which has no nested parentheses (i.e., only uses * and comma), and which has the proprety that any document satisfying (i), satisfies (ii), and (ii) is as restrictive as possible (i.e., allows only as few additional valid XML documents as possible.) Recall that the final goal is to be able to store elements satisfying (i).
Explanation / Answer
Assume that the XML data conforms to the following DTD:
<!DOCTYPE bib [
<!ELEMENT bib (book* )>
<!ELEMENT book (title, (author+ | editor+ ), publisher?, price )>
<!ATTLIST book year CDATA #REQUIRED >
<!ELEMENT author (last, first )>
<!ELEMENT editor (last, first, affiliation )>
<!ELEMENT title (#PCDATA )>
<!ELEMENT last (#PCDATA )>
<!ELEMENT first (#PCDATA )>
<!ELEMENT affiliation (#PCDATA )>
<!ELEMENT publisher (#PCDATA )>
<!ELEMENT price (#PCDATA )>
]>
Here first we need to design a relational schema for the XML data. The schema should have relations corresponding to entity sets such as Book, Author, etc., as well as relationships between these entity sets. Now we need to write the relation names only and their columns.
for example,:Author(aid, last, first); here we don’t need to write the field types nor the key/foreign key constraints.
The relation schema should be :
Book(bid, title, publisher, price)
Author(aid, last, first)
Editor(eid, last, first, affiliation)
Writes(aid, bid)
Edits(edi, bid)
Parentheses in content models work just the same as they do in arithmetic expressions or any other recursive expression language -- in any given case, they may be necessary or unnecessary, and we can understand the meaning of the expression best by understanding its structure, because in any expression language meaning of an expression will normally be defined in terms of its structure.
the xml of the above dtd file is
<bib>
<book year="1994">
<title>TCP/IP Illustrated</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
<price>65.95</price>
</book>
<book year="1992">
<title>Advanced Programming in the Unix environment</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
<price>65.95</price>
</book>
<book year="2000">
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann Publishers</publisher>
<price>39.95</price>
</book>
<book year="1999">
<title>The Economics of Technology and Content for Digital TV </title>
<editor>
<publisher>Kluwer Academic Publishers</publisher>
<price>129.95</price>
</book>
</bib>
<reviews>
<entry>
<title>Data on the Web</title>
<price>34.95</price>
<review>A very good discussion of semi-structured database systems and XML</review>
</entry>
<entry>
<title>Advanced Programming in the Unix environment</title>
<price>65.95</price>
<review>A clear and detailed discussion of UNIX programming.</review>
</entry>
<prices>
<book>
<title>Advanced Programming in the Unix environment</title>
<source>www.amazon.com</source>
<price>65.95</price>
</book>
<book>
<title>Advanced Programming in the Unix environment</title>
<source>www.bn.com</source>
<price>65.95</price>
</book>
<book>
<title>TCP/IP Illustrated</title>
<source>www.amazon.com</source>
<price>65.95</price>
</book>
<book>
<title>TCP/IP Illustrated</title>
<source>www.bn.com</source>
<price>65.95</price>
</book>
<book>
<title>Data on the Web</title>
<source>www.amazon.com</source>
<price>34.95</price>
</book>
<book>
<title>Data on the Web</title>
<source>www.bn.com<./source>
<price>39.95</price>
</book>
</prices>