Tuesday, October 12, 2010

WYSINWYE - - Part 1

Yes! You have seen it correctly. Not what one is normally used to seeing in the context of what you see is it? The abbreviation translates into "What You See Is Not What You Expect".

I guess that this Blog entry could be considered to be a wee bit "Nerdy" but I have done it because it may answer some questions that others may have been experiencing when using the BikePack package.

I have been testing an S1000D product and because we wish to make the results available to demonstrate some of the functionality that it has, and because a lot of the data that we have for this purpose belongs to our clients, I have been using, what else(?), the various S1000D Bike Pack sets of data available for download from the S1000D.org site.

In particular two problems came to light, sort of in your face type problems and I would like to share them with you because they both are as a result of the xml markup.

I have split this WYSINWYE subject into two parts, this one to do with ISO 8879 and the other (in Part 2) to do with graphics.

An ISO 8879 problem

Having built the Issue 3 Bike Pack (SGML version actually) into a publication I noticed some funny formatting associated with random lists (it may also affect ordered lists of course). I have included a screen shot of the output below.


From this you can probably see that the 'bullet' mark for each list item is way above the text. In fact the text is located on the next paragraph line. 

Naturally, since we are in a testing frame of mind,  the first thought is that there is something wrong with the application's  formatting rules. A closer examination showed that this was not the case.

The next step is to look at the SGML and as you can see from the screen shot below this at first glance looks very good. Nicely formatted to make it more readable to the human eye (I have switched on the
display of tabs/spaces/cr's etc.).


And this is actually where the problem originates. The structure for the item contains mixed content as this fragment of the DTD (in Near&Far) shows.

So things are not quite what they seem when it comes to interpretation of the SGML itself. The basic problem here is that ISO 8879 says that if there is a record end between an element markup and another
element markup then the data will be displayed on the next line. (This is a simplified version of what is contained in ISO 8879 B.3.3.1).

By removing the 'record end' after the <item> tag the formatting is restored to what is expected in this case.

I guess that this problem will also affect XML as well as since XML is an ISO 8879 application but I have not gone looking for it.

2 comments:

GreatWhiteDork said...

It depends on the XML parser. some will consider that a text node (with no text) and others will consider it non-important white space. I don't remember which do, and which don't but I've been bitten by it before.

martyn said...

Many thanks for you comment.

And I guess that that comment is relevant for most bits of sgml/xml software. The problem is that. if we are to be completely ISO8879 compliant, the software should react like the example given. The fact that some XML (particularly) applications do not exactly follow the ISO is, I think, a symptom of some XML developers in my experience - having got a tagged up document they make the output look right instead of in accordance with how it should work.

The problem with this particular fault is that the document IS compliant but that the true result is not what is expected. So, unless we write a parser with some code to detect this particular problem there are going to be a number of files that give different results according to the software being used to view them. Not a good scenario as the generator of the document will probably not know if their application is doing it right or wrong.

Again, thanks for your input.
Martyn