Other XML vs. JSON

I hope some of you find this interesting.

One feature that JSON offers is an array-like or list syntax by placing items inside square brackets. In contrast, XML feels a little flat. In XML, you can separate values with spaces, tabs or commas as CDATA or PCDATA but the application needs to know to pull out the text chunk and parse it into something more meaningful. I just like the JSON list syntax.

On the other hand, XML does distinguish attributes from elements. For nested content, the attributes are parsed first. This is a bonus because you can designate an attribute to have type information that provides clues as to what content follows.

Here's a contrived example where you have a binary image content that could be JPEG, PNG or TIFF.

JSON:
{"Type": "TIFF", "Content":"binary-content-coded-as-base64"}

XML:
<Image type="TIFF">binary-content-coded-as-base64</Image>

The two examples look very equivalent. The trouble with JSON is JSON makes no guarantees of field order. The "type" field might not appear before the "content" field. That content field can be huge. Your parser must cache the contents while waiting to see what type it is. Also consider more elaborate nested content involving more than just two fields.

On the XML side, if you're using SAX or Expat or STAX, you'll see the "type" attribute before you get to the binary blob.

This is one particular case where I think XML is easier to use over JSON.

Here's an example of someone fixing the order of JSON output.

In case anyone is wondering, when I do use JSON - it is almost always with nlohmann.

Happy Friday!
 
I like that you can write powerful external validators for XML. You can give them to people who want to make XML files to feed into your software and they get (reasonably) good error messages before they hit the main system.
 
Do not forget the ecosystem XML brings with it: XML stylesheets (XSLT), XQuery, XLink, XPointer, SOAP (though a lot of people seem to think SOAP is a bad thing), XSL-FO. Also, it is easy to integrate other XML languages. For example, if you need to include some math notation, you can embed MathML. (Even if you enter you math text as LaTeX, there are programs that will convert it to MathML.) And if you need to extract data from web pages, you can use HTML Tidy to convert them to XHTML and then treat them as regular XML documents.

I am sure JSON has some ecosystem, too, though I am not familiar with it. My point is that it would be wise to look at the whole ecosystem and how it can help you, rather than the raw syntax for a particular piece of data.

Also remember that whichever you use, there are programs like yq that will convert from JSON to XML, and other programs (whose names escape me right now) that will convert from XML to JSON. So, for example, when I am dealing with JSON data, I frequently convert it to XML, because I am more comfortable with the XML ecosystem.
 
SOAP looks complicated - like it was designed by committee. However, there are code generators that turn a SOAP schema into classes in your language of choice.

REST started as one person's idea that became refined over time. It was easier to explain and therefore, easier to adopt - even after SOAP was established. It too has its share of tools - like swagger - for testing your web API.
 
JSON's advantage is that it works with JavaScript, which means you don't need anything if you already have JavaScript. It's also great for people who don't know what kind of object they're going to be making, don't have to deal with versioning objects, don't really need a lot of object validation (it either works or it doesn't), or ignore all those problems because some framework does all that work for them and interloping rubes shouldn't be sniffing around.

Other than that, XML looking pretty good... except for the fact that the open source tooling around it _was_ seat of the pants and is _now_ falling apart!
 
Both are bloated and wasteful. An array of N objects duplicates the name of each struct member N times. I've seen tools getting OOM killed because of this.

Now RAM prices are high and we should come up with something better.
 
Both are bloated and wasteful. An array of N objects duplicates the name of each struct member N times. I've seen tools getting OOM killed because of this.
There's a lot of code out there - for XML and JSON - that parses the data into one DOM or DOM-like object. Everything now in memory. Unless you impose an upper bound on the size of that thing, you'll get an unpleasant surprise. Libraries that provide an object stream allow you to process the data in chunks instead of an all-or-nothing DOM representation.
Now RAM prices are high and we should come up with something better.
To anyone reading this - find a bit of wasteful code, fix it to reduce the memory footprint, documenting before and after performance. Be sure to mention this at your next performance review.

Hey Boss - we now need fewer servers or can run on cheaper server with smaller disks.
 
Back
Top