2000-01-31

XML Pocket Reference

XML Pocket Reference
Eckstein, Robert
O'Reilly, 1999-10 (first edition)
ISBN 1-56592-709-5

NOTE: Link is to the 2nd Edition, which was published subsequent to this review.

The book gives a good initial feel for XML, and makes it easy to find information about specific aspects of the (proto-)standard. It suffers a little for being written in advance of the standard's finalization, but so do all the other XML books of the same era.

I'd definitely like to see a revised version of this once all the standards have been settled. I'm a big fan of the Pocket Reference series from O'Reilly, and this book makes an excellent addition to the series. Recommended for those wanting to familiarize themselves with XML in general. Other references will be necessary to get a full picture, but this is a great place to build up the initial context that will make following other treatments easier.

I liked the ISO and RFC references on page 20 (in the discussion of xml:lang).

I would have liked to see some pointers for tools for dealing with XML, since full support of XML, XSL and such in web browsers has not yet appeared.

UPDATE 2003-02-08: The first edition is out of print. You can find the second edition here.

Introduction (pages 1-2)

  • Page 2, paragraph 2: "As you read this" should be "As I write this".

XML Terminology (pages 2-14)

  • Page 5, example 1: The example refers to its DTD as "sample.dtd", but the document is "simple.xml", and example 2 on page 9 calls the corresponding DTD "simple.dtd".
  • Page 6, paragraph 4: "...that validates each of the document elements that appear inside the root element." Similar wording appears at page 16 in the discussion of the SYSTEM keyword: "...applied to all the elements inside of root element." [Underlining added]. This means that the DTD doesn't specify the attributes that are legal for the root element itself, which doesn't seem right.
  • Page 10, paragraph 4: "We'll see how to get around this shortly". It would be better to say where (or at least to say that it is done using entities).
  • Page 11, paragraph 5: "...not XML content itself." The stylesheet is XML, but it isn't target XML.
  • Page 13, paragraph 2: "has any chil-dren elements;" should be "has any child elements".
  • Page 14, figure 1: The colon after "Price" is missing. A colon should be there based on the corresponding <xsl:text> element in example 3 on page 13. Also, the prose in paragraph 2 on page 13 refers to: The text "Price: $" will now..., which is another indication that there should be a colon.

XML Reference (pages 14-23)

  • Page 15, discussion of <?xml ... ?>: Question: If the XML declaration is present, must it be the first non-whitespace text in the stream?
  • Page 16, discussion of <!DOCTYPE>: Remove "currently" from "This instruction can currently take one of two forms:". Also, the example shows "<!DOCTYPE <Book> SYSTEM...". The angle brackets around "Book" should be deleted.
  • Page 16, bottom: "If that fails, the URI is used:". Note: this assumes the existence of a network connection.
  • Page 17, example at top: The same extraneous angle brackets around "Book" occur here as on page 16.
  • Page 17, discussion of <? ... ?>: Question: are document processors supposed to ignore processing instructions they do not understand?
  • Page 18, top: "Note that you may not use entity references inside a CDATA marked section, as they will not be expanded." Really means to say just that they won't be expanded. Use the character sequences all you want, they just won't map to the meanings they have elsewhere. This does leave open the question of how to end up with "\]]>" inside CDATA marked sections...
  • Page 18, paragraph 1 of section "Element and Attribute Rules": "Elements can contain..." should be "Non-empty elements can contain..."
  • Page 18, last paragraph: Note: having periods in element names would have been nice for CSS2 and HTML4, so you could say <p.foo>Bar</p> instead of <p class='foo'>Bar</p>.
  • Page 18, bottom: Note: case-(in)sensitivity is a difference between HTML and XML.
  • Pages 18-19: Varying use of "element name" and "element type name". There should be only one term for this.
  • Page 19, paragraph 2: Element names like <xml...&gt aren't actually Illegal. They are really just reserved/restricted for use with W3C defined standards. If they were illegal, then W3C couldn't define meanings for them, either.
  • Page 19, middle: "...plus a number of others our publishing system isn't equipped to handle." should be plus most other Unicode symbols?
  • Page 21, "simple" under "xml:link": Is "simple" assumed if the attribute is not present?
  • Page 21, last paragraph: "This section also..." should be "That section also...". Also, the page should end "...remapping, which is discussed on the next page."
  • Page 22, "Entity References": Entities are not used solely for single characters, so it would be better to say "...for specific text in XML." Besides the common use mentioned, entities can also be used to call in the contents of an entire file.
  • Page 22, last paragraph: Note: the requirement for the terminating semicolon at the end of an entity reference differs from common HTML practice.
  • Page 23, paragraph 1: add "including PCDATA and attribute values" to the end of the sentence to reinforce.
  • Page 23, paragraph 3: Note: © is &copy; in HTML.

Document Type Definitions (pages 23-38)

  • Page 23, "Document Type Definitions": Set "well-formed" in italics. Also, "which adheres" should be "which simply adheres". Finally, some discussion about when to use attributes and when to use sub-elements, or pointers to such a discussion would be good.
  • Page 24, paragraph 3: "Element names may not start with the string xml, in any variation of upper- or lowercase" should be followed by "except for those declared by W3C".
  • Page 25, top: Note: With HTML and web browsers, sometimes newlines are kept as whitespace, interfering with readability formatting of the HTML source. In XML, this whitespace is explicitly to be ignored.
  • Page 25, paragraph 4: "...may want to specify that another elment..." should be "...may want to specify that a particular element...".
  • Page 27, paragraph 1: "...familiar to shell programmers." should be "...familiar to programmers of languages which use regular expressions (such as shell, Perl and awk)."
  • Page 27, paragraph 3: "<!ELEMENT author" should be "<!ELEMENT authors", since it contains many <authorname>s.
  • Page 28, paragraph 2: The statuscode example really doesn't give a feel for why this is useful.
  • Page 28, "Entities": As on page 22, this should say that you may create entities which stand for general text, not just single characters.
  • Page 31, paragraph 2: Note: This would be useful for images used as bullets for a list.
  • Page 31, paragraph 4: The format of the SYSTEM identifier could use some explanation.
  • Page 32, "#FIXED": Note: Useful when an attribute is deprecated, but we need to set a value for it in addition to causing a validator to complain if it is explicitly set somewhere. This way, older processors can still receive a value.
  • Page 34, top of page: Shows both standalone="yes" and <!DOCTYPE ...>. Contrast to discussion of standalone on page 16, where it says that if the attribute's value is "no", then there must be a <!DOCTYPE> instruction to refer to the DTD, implying that if the value is "yes", there should not be a DTD declaration. Specifically, the example is not standalone becuase it requires the companion file "sector.dtd".
  • Page 35, bottom: Note: Placing multiple attribute definitions together introduces a danger of syntax error cascade.
  • Page 38, paragraph 2: Gotcha: Becuase the internal subset is processed first it cannot use any parameters from the external.

The Extensible Stylesheet Language (pages 38-71)

  • Page 51, "count": Page 50 shows xsl:number being used to output a formatted number, while this page talks about using it to select nodes. Does it really serve both functions, or is one of these in error?
  • Page 55, Description of "digit-group-sep": It says "...a comma between each of three digits in a number,...". It should say "...a comma between each group of three digits in a number,...".
  • Page 57, Last paragraph: There appears to be a copy/paste error between this (xsl:arg) and xsl:macro-arg. Where it says "default=" it should say "value=". Also, "...and default value," should be "...and a value,".
  • Page 61, "xsl:copy": This could use more description, and perhaps an example.
  • Page 61, "xsl:counter": In the example, there are no newlines generated, so the output would be something like "Chapter 1:Chapter 2:Chapter 3:".
  • Page 65, paragraph 4: "...list of each chapter that..." should be "...list of the chapters that...".
  • Page 67, "Specifies the name of an..." should be "Specifies the URI of an...", like on Page 66 under xsl:import. Also, where it says "Compare to..." it should say "Contrast to", since we are focusing on the differences, not the similarities.
  • Page 67, "xsl:macro": The HTML-looking snippet "<B>WARNING! </B>" is pre-4.0 HTML. Also, shouldn't it be surrounded by <xsl:text>? In which domain is the macro: stylesheet or output?
  • Page 68, paragraph 1: Add at end: "(which is apply-templates)."
  • Page 68, paragraph 2: Add at end: "(which is excluded from the definition of 'contents' above."
  • Page 68, last paragraph: "See the section..." should be "See the subsection".
  • Page 69, "xsl:pi": Note: Processing instructions can also be generated via xsl:text, but doing so would lose the semantic difference between general text and processing instructions.
  • Page 70, paragraph 1: The statement "all whitespace located between its opening and closing tags is insignificant" means that "True     Lies" would turn into "TrueLies", not "True Lies". Is this right?
  • Page 70, xsl:value-of: Line two omits the "-of" suffix of the element name. Also, a complete example here would be good.

XLink and XPointer (pages 72-97)

  • Page 72, paragraph 1: "that the standard" should be "that the not-yet-standard".
  • Page 74: Where it says "browser" here, read "user agent", the more generic W3C term.
  • Page 74, paragraph 2: "For now it's best to simply make a link as meaningful as you can.": What does this really mean? Perhaps substituting "specific" for "meaningful" works.
  • Page 74, last code: ".../XLink#XPtr; This looks just like an HTML reference, so it doesn't really add to the discussion.
  • Page 75, paragraph 6: "On the other hand, if the element is deleted," should be "Additionally, if the element is deleted," since we aren't going making a negative-positive transition. They are both positive.
  • Page 76, paragraph 3: Allowing a dash in simple-doc.xml#root() isn't good.
  • Page 76, "html()": Mention XHTML.
  • Page 76, "origin()": It is an intra-document location term.
  • Page 78, paragraph 2: Using "#text" in the example is a misleading correspondence between the text of the document and the meaning of #text. Plain-looking text would be better.
  • Page 80, paragraph -2: "which count with positive numbers" should be "which are counted with positive numbers". Also, "ends after the <emph>" should be "ends after the </emph>".
  • Page 80, paragraph -1: "shows a set of postive-numbered descendants" should be "shows the positive and negative descendants".
  • Page 81, table 7 caption: Should be "Positive and negative XPointers with descendant()".
  • Page 89, paragraph 5: "any-one who's used the Web before" should be "any-one who's written or read HTML before".
  • Page 94, paragraph -2: "Links on the extended" should be "Attributes on the extended".
  • Page 94, paragraph -1: "ftrue" should be "true".

No comments: