Other XML Idiocy

Quite often I see decisions around creating XML schemas that involve the same mistake repeated over and over.

The persons making the decisions seem to have forgotten that XML supports attributes and that you express data in those attributes.

Compare these two fragments:

<Company>
<Name>Umbrella Corporation</Name>
<City>Ottawa</City>
<Province>Ontario</Province>
</Company>

<Company
Name="Umbrella Corporation"
City="Ottawa"
Province="Ontario" />

The former wastes space. As to why some software favors the former over the latter makes no sense to me.

No need to reply. I just like to complain. :mad:
 
So you are putting field data inside the <Company> element for different companies that are described in the attributes? That might work but I'd be concerned that the attributes might lock you in somehow later on.
 
The former wastes space. As to why some software favors the former over the latter makes no sense to me.
There are many reasons to prefer 1st style instead of 2nd. And probably you don't know (or remember) the way of thinking from the time when XML was "modern". About "wasted space" the programmers answer "today disk space and RAM are cheap, it is not a problem to waste some KB". I agree with the title of this topic and can add that XSLT was much bigger idiocy.
 
That might work but I'd be concerned that the attributes might lock you in somehow later on.
It most definitely does work - for decades. Understand that was a toy example.

Your concern of schema lock-in is valid but a loose schema can also be a problem. Enforcing a schema is the first line of defense against mal-formed data. Someone's "well-formed" XML might be crafted to purposely break the parser.

The usual fix is a reference to specific DTD or XSD.
 
the way of thinking from the time when XML was "modern". About "wasted space" the programmers answer "today disk space and RAM are cheap, it is not a problem to waste some KB".
Fans of JSON like to pick on XML for this very reason. "Oh the XML takes more space than the same data in JSON." Often, it is - in my opinion - a poor schema choice that makes XML look worse than it really is.
 
Fans of JSON like to pick on XML for this very reason. "Oh the XML takes more space than the same data in JSON." Often, it is - in my opinion - a poor schema choice that makes XML look worse than it really is.
That's why I don't like JSON. It's not that there's anything inherently wrong with JSON, it's that it's not intended to be used the way that I was using it and something else would likely make more sense. I'll probably eventually go back and redo things using pkl.
 
xslt must the most screwed up programming language that isn't intended to be satire.
one of our college elders was a fan of XML and XML schema and thus XSLT. We once called it "cactus-fucking haskell" (because of all the pointy brackets) to his face and he thought that was hilarious.
 
Didn't have the patience to read all responses but from a programmatic POV the second form is much easier to write a parsing engine for...assuming you hand code your parsing engines like I do. LOL

But in the name of total disclosure...I hate XML with a Passion. Would rather base my stuff on JSON. XML is too wordy and DTDs give me a headache.
 
I think it's more of a matter of taste or habit. Furthermore, comparisons should be made not only for the web and related documents, but also for other applications, using not only JavaScript as a yardstick (by the way, in vulpine's link doesn't mention the methods JavaScript currently implements natively JSON.parse/JSON.stringify).
 
xslt must the most screwed up programming language that isn't intended to be satire.
It's not really a programming language, I don't think. That's why I've always preferred XQuery (though lately the Zorba website seems to have disappeared). XQuery makes a lot more sense to programmers. I always have to bend up my brain to work with XSLT, but I think it handles mixed content better than a programming language would.

I could be wrong, but I look at it kind of like I look a regular expressions: it is very hard to look at a complicated regex and quickly figure out what it is doing. But I don't know how I would improve the syntax and still keep the power it has.
 
Quite often I see decisions around creating XML schemas that involve the same mistake repeated over and over.

The persons making the decisions seem to have forgotten that XML supports attributes and that you express data in those attributes.

Compare these two fragments:

<Company>
<Name>Umbrella Corporation</Name>
<City>Ottawa</City>
<Province>Ontario</Province>
</Company>

<Company
Name="Umbrella Corporation"
City="Ottawa"
Province="Ontario" />

The former wastes space. As to why some software favors the former over the latter makes no sense to me.

No need to reply. I just like to complain. :mad:
The common wisdom seems to be that attributes are for metadata. For example, suppose you have a mixed-content document, the digitization of an old church manuscript from the middle ages. You might want to record things like which scribe you think copied that page, or where the original document is stored. These are not part of the document itself, so there is something to be said for having them as attributes. If you use an XSLT to convert the document to a webpage, for example, you do not have to worry about filtering out the metadata.

With your example, a pure-XML document, one consideration is that attributes do not have hierarchy. So you might decide to have City underneath Province, since the same city name may occur in more than one province. Then you could have:

<Company>
<Name>Umbrella Corporation</Name>
<Province>
<Pname>Ontario</Pname>
<City>Ottawa</City>
</Province>
</Company>

(The new <Pname> tag is optional. You could just put "Ottawa" after <Province>, but this keeps it from becoming a mixed-content document.)

If you have already committed to using attributes, then you end up with something like this:

<Company
Name="Umbrella Corporation">
<Province>
<Pname>Ontario</Pname>
<City>Ottawa</City>
</Province>
</Company>

This is kind of unhandy to work with using XPath expressions.

Size is not a main consideration for XML files. If necessary, you can compress them, or convert them to JSON and convert them back to process them, in order to take advantage of XML's rich tool set. If you need to transmit them over low-bandwidth channels, you can use EXI, Efficient XML Interchange (https://www.w3.org/TR/exi/).

An example of emphasizing processing ease over size is MathML, the XML language for mathematical equations. Most people use LaTeX to specify mathematical equations. If you wanted to specify the quadratic equation in LaTeX, it would look like this:

x=\frac{-b \pm \sqrt{b^2-4ac}}{2a}

If you converted to MathML, it would look like this:

<math display="block" class="tml-display" style="display:block math;">
<mrow>
<mi>x</mi>
<mo>=</mo>
<mfrac>
<mrow>
<mo lspace="0em" rspace="0em">−</mo>
<mi>b</mi>
<mo>±</mo>
<msqrt>
<mrow>
<msup>
<mi>b</mi>
<mn>2</mn>
</msup>
<mo>−</mo>
<mn>4</mn>
<mi>a</mi>
<mi>c</mi>
</mrow>
</msqrt>
</mrow>
<mrow>
<mn>2</mn>
<mi>a</mi>
</mrow>
</mfrac>
</mrow>
</math>

That is impossible to work with by hand. But it is easy to process with XSLT. Different countries have different conventions of notation. For example, IIRC, in German, rather than specifying the tangent of x as tan x, they would say tg x. If you are preparing a document for different countries, MathML lets you make these changes automatically, by having different XML stylesheets for different countries. But nobody is going to enter equations manually using MathML, because it will give you a headache.

The point of this (unfortunately long-winded) example is that it is not always about size, but sometimes about processing convenience. Similarly when I have to work with a JSON file, I usually convert it to XML (with yq -x . | tidy5 -xml -indent -quiet), because then I can work with it using the XML tools that I am comfortable with.
 
Back
Top