Other XML Idiocy

elephant · Feb 4, 2026

Quite often I see decisions around creating XML schemas that involve the same mistake repeated over and over.

The persons making the decisions seem to have forgotten that XML supports attributes and that you express data in those attributes.

Compare these two fragments:

<Company>
<Name>Umbrella Corporation</Name>
<City>Ottawa</City>
<Province>Ontario</Province>
</Company>

<Company
Name="Umbrella Corporation"
City="Ottawa"
Province="Ontario" />

The former wastes space. As to why some software favors the former over the latter makes no sense to me.

No need to reply. I just like to complain.

drhowarddrfine · Feb 4, 2026

You have two different things going on there. The first says "this element contains information about a company". The second is an element with attributes that describe the element. So they're not the same thing

covacat · Feb 4, 2026

you can xslt it back and forth

cracauer@ · Feb 4, 2026

xslt... /me reaches for the crucifix.

drhowarddrfine · Feb 4, 2026

cracauer@ said:
xslt... /me reaches for the crucifix.

Yeah I really struggled with that back in the day.

cracauer@ · Feb 4, 2026

xslt must the most screwed up programming language that isn't intended to be satire.

elephant · Feb 4, 2026

covacat and drhowarddrfine both your statements are correct. Under what circumstances might you prefer an element for field data (not nested content) over an attribute?

covacat · Feb 4, 2026

well in this case id prefer attributes over child nodes

what i hate most is xml namespaces

drhowarddrfine · Feb 4, 2026

So you are putting field data inside the <Company> element for different companies that are described in the attributes? That might work but I'd be concerned that the attributes might lock you in somehow later on.

6502 · Feb 4, 2026

unitrunker said:
The former wastes space. As to why some software favors the former over the latter makes no sense to me.

There are many reasons to prefer 1st style instead of 2nd. And probably you don't know (or remember) the way of thinking from the time when XML was "modern". About "wasted space" the programmers answer "today disk space and RAM are cheap, it is not a problem to waste some KB". I agree with the title of this topic and can add that XSLT was much bigger idiocy.

elephant · Feb 4, 2026

drhowarddrfine said:
That might work but I'd be concerned that the attributes might lock you in somehow later on.

It most definitely does work - for decades. Understand that was a toy example.

Your concern of schema lock-in is valid but a loose schema can also be a problem. Enforcing a schema is the first line of defense against mal-formed data. Someone's "well-formed" XML might be crafted to purposely break the parser.

The usual fix is a reference to specific DTD or XSD.

elephant · Feb 4, 2026

6502 said:
the way of thinking from the time when XML was "modern". About "wasted space" the programmers answer "today disk space and RAM are cheap, it is not a problem to waste some KB".

Fans of JSON like to pick on XML for this very reason. "Oh the XML takes more space than the same data in JSON." Often, it is - in my opinion - a poor schema choice that makes XML look worse than it really is.

hruodr · Feb 4, 2026

I also prefer SGML for that reason. But more difficult to parse. JSON is nice concise, it can be the value of a column in sqlite3 and there are sqlite3 functions to deal with it:

JSON Functions And Operators

hedwards · Feb 4, 2026

unitrunker said:
Fans of JSON like to pick on XML for this very reason. "Oh the XML takes more space than the same data in JSON." Often, it is - in my opinion - a poor schema choice that makes XML look worse than it really is.

That's why I don't like JSON. It's not that there's anything inherently wrong with JSON, it's that it's not intended to be used the way that I was using it and something else would likely make more sense. I'll probably eventually go back and redo things using pkl.

atax1a · Feb 4, 2026

cracauer@ said:
xslt must the most screwed up programming language that isn't intended to be satire.

one of our college elders was a fan of XML and XML schema and thus XSLT. We once called it "cactus-fucking haskell" (because of all the pointy brackets) to his face and he thought that was hilarious.

vulpine · Feb 4, 2026

Ounce it goes through a compressor it comes out the same anyways.

drhowarddrfine · Feb 5, 2026

vulpine Define "it" and no it doesn't.

kent_dorfman766 · Feb 5, 2026

Didn't have the patience to read all responses but from a programmatic POV the second form is much easier to write a parsing engine for...assuming you hand code your parsing engines like I do. LOL

But in the name of total disclosure...I hate XML with a Passion. Would rather base my stuff on JSON. XML is too wordy and DTDs give me a headache.

vulpine · Feb 14, 2026

yeah it is. This is just bike-shedding. Here is my source: https://www.balisage.net/Proceedings/vol10/html/Lee01/BalisageVol10-Lee01.html

freethread · Sunday at 9:48 PM

I think it's more of a matter of taste or habit. Furthermore, comparisons should be made not only for the web and related documents, but also for other applications, using not only JavaScript as a yardstick (by the way, in vulpine's link doesn't mention the methods JavaScript currently implements natively JSON.parse/JSON.stringify).

drhowarddrfine · Sunday at 10:16 PM

freethread said:
comparisons should be made not only for the web and related documents, but also for other applications

Exactly. JSON was not created as a replacement or substitute for XML.

Jose · 2026-02-20T22:08:52+0000

hedwards said:
I'll probably eventually go back and redo things using pkl.

This PKL?

Pkl :: Pkl Docs

pkl-lang.org

hedwards · 2026-02-21T05:46:37+0000

Jose said:
This PKL?

Pkl :: Pkl Docs

pkl-lang.org

Yes, I haven't messed around with it a lot, but my initial look at it was promising.

lgrant · 2026-02-21T07:08:01+0000

cracauer@ said:
xslt must the most screwed up programming language that isn't intended to be satire.

It's not really a programming language, I don't think. That's why I've always preferred XQuery (though lately the Zorba website seems to have disappeared). XQuery makes a lot more sense to programmers. I always have to bend up my brain to work with XSLT, but I think it handles mixed content better than a programming language would.

I could be wrong, but I look at it kind of like I look a regular expressions: it is very hard to look at a complicated regex and quickly figure out what it is doing. But I don't know how I would improve the syntax and still keep the power it has.

lgrant · 2026-02-21T18:38:33+0000

elephant said:
Quite often I see decisions around creating XML schemas that involve the same mistake repeated over and over.

The persons making the decisions seem to have forgotten that XML supports attributes and that you express data in those attributes.

Compare these two fragments:

<Company>
<Name>Umbrella Corporation</Name>
<City>Ottawa</City>
<Province>Ontario</Province>
</Company>

<Company
Name="Umbrella Corporation"
City="Ottawa"
Province="Ontario" />

The former wastes space. As to why some software favors the former over the latter makes no sense to me.

No need to reply. I just like to complain.

The common wisdom seems to be that attributes are for metadata. For example, suppose you have a mixed-content document, the digitization of an old church manuscript from the middle ages. You might want to record things like which scribe you think copied that page, or where the original document is stored. These are not part of the document itself, so there is something to be said for having them as attributes. If you use an XSLT to convert the document to a webpage, for example, you do not have to worry about filtering out the metadata.

With your example, a pure-XML document, one consideration is that attributes do not have hierarchy. So you might decide to have City underneath Province, since the same city name may occur in more than one province. Then you could have:

<Company>
<Name>Umbrella Corporation</Name>
<Province>
<Pname>Ontario</Pname>
<City>Ottawa</City>
</Province>
</Company>

(The new <Pname> tag is optional. You could just put "Ottawa" after <Province>, but this keeps it from becoming a mixed-content document.)

If you have already committed to using attributes, then you end up with something like this:

<Company
Name="Umbrella Corporation">
<Province>
<Pname>Ontario</Pname>
<City>Ottawa</City>
</Province>
</Company>

This is kind of unhandy to work with using XPath expressions.

Size is not a main consideration for XML files. If necessary, you can compress them, or convert them to JSON and convert them back to process them, in order to take advantage of XML's rich tool set. If you need to transmit them over low-bandwidth channels, you can use EXI, Efficient XML Interchange (https://www.w3.org/TR/exi/).

An example of emphasizing processing ease over size is MathML, the XML language for mathematical equations. Most people use LaTeX to specify mathematical equations. If you wanted to specify the quadratic equation in LaTeX, it would look like this:

x=\frac{-b \pm \sqrt{b^2-4ac}}{2a}

If you converted to MathML, it would look like this:

<math display="block" class="tml-display" style="display:block math;">
<mrow>
<mi>x</mi>
<mo>=</mo>
<mfrac>
<mrow>
<mo lspace="0em" rspace="0em">−</mo>
<mi>b</mi>
<mo>±</mo>
<msqrt>
<mrow>
<msup>
<mi>b</mi>
<mn>2</mn>
</msup>
<mo>−</mo>
<mn>4</mn>
<mi>a</mi>
<mi>c</mi>
</mrow>
</msqrt>
</mrow>
<mrow>
<mn>2</mn>
<mi>a</mi>
</mrow>
</mfrac>
</mrow>
</math>

That is impossible to work with by hand. But it is easy to process with XSLT. Different countries have different conventions of notation. For example, IIRC, in German, rather than specifying the tangent of x as tan x, they would say tg x. If you are preparing a document for different countries, MathML lets you make these changes automatically, by having different XML stylesheets for different countries. But nobody is going to enter equations manually using MathML, because it will give you a headache.

The point of this (unfortunately long-winded) example is that it is not always about size, but sometimes about processing convenience. Similarly when I have to work with a JSON file, I usually convert it to XML (with yq -x . | tidy5 -xml -indent -quiet), because then I can work with it using the XML tools that I am comfortable with.