Quite often I see decisions around creating XML schemas that involve the same mistake repeated over and over.
The persons making the decisions seem to have forgotten that XML supports attributes and that you express data in those attributes.
Compare these two fragments:
<Company>
<Name>Umbrella Corporation</Name>
<City>Ottawa</City>
<Province>Ontario</Province>
</Company>
<Company
Name="Umbrella Corporation"
City="Ottawa"
Province="Ontario" />
The former wastes space. As to why some software favors the former over the latter makes no sense to me.
No need to reply. I just like to complain.
The common wisdom seems to be that attributes are for metadata. For example, suppose you have a mixed-content document, the digitization of an old church manuscript from the middle ages. You might want to record things like which scribe you think copied that page, or where the original document is stored. These are not part of the document itself, so there is something to be said for having them as attributes. If you use an XSLT to convert the document to a webpage, for example, you do not have to worry about filtering out the metadata.
With your example, a pure-XML document, one consideration is that attributes do not have hierarchy. So you might decide to have City underneath Province, since the same city name may occur in more than one province. Then you could have:
<Company>
<Name>Umbrella Corporation</Name>
<Province>
<Pname>Ontario</Pname>
<City>Ottawa</City>
</Province>
</Company>
(The new <Pname> tag is optional. You could just put "Ottawa" after <Province>, but this keeps it from becoming a mixed-content document.)
If you have already committed to using attributes, then you end up with something like this:
<Company
Name="Umbrella Corporation">
<Province>
<Pname>Ontario</Pname>
<City>Ottawa</City>
</Province>
</Company>
This is kind of unhandy to work with using XPath expressions.
Size is not a main consideration for XML files. If necessary, you can compress them, or convert them to JSON and convert them back to process them, in order to take advantage of XML's rich tool set. If you need to transmit them over low-bandwidth channels, you can use EXI, Efficient XML Interchange (
https://www.w3.org/TR/exi/).
An example of emphasizing processing ease over size is MathML, the XML language for mathematical equations. Most people use LaTeX to specify mathematical equations. If you wanted to specify the quadratic equation in LaTeX, it would look like this:
x=\frac{-b \pm \sqrt{b^2-4ac}}{2a}
If you converted to MathML, it would look like this:
<math display="block" class="tml-display" style="display:block math;">
<mrow>
<mi>x</mi>
<mo>=</mo>
<mfrac>
<mrow>
<mo lspace="0em" rspace="0em">−</mo>
<mi>b</mi>
<mo>±</mo>
<msqrt>
<mrow>
<msup>
<mi>b</mi>
<mn>2</mn>
</msup>
<mo>−</mo>
<mn>4</mn>
<mi>a</mi>
<mi>c</mi>
</mrow>
</msqrt>
</mrow>
<mrow>
<mn>2</mn>
<mi>a</mi>
</mrow>
</mfrac>
</mrow>
</math>
That is impossible to work with by hand. But it is easy to process with XSLT. Different countries have different conventions of notation. For example, IIRC, in German, rather than specifying the tangent of x as
tan x, they would say
tg x. If you are preparing a document for different countries, MathML lets you make these changes automatically, by having different XML stylesheets for different countries. But nobody is going to enter equations manually using MathML, because it will give you a headache.
The point of this (unfortunately long-winded) example is that it is not always about size, but sometimes about processing convenience. Similarly when I have to work with a JSON file, I usually convert it to XML (with
yq -x . | tidy5 -xml -indent -quiet), because then I can work with it using the XML tools that I am comfortable with.