Jun 222012
 

This morning I was reading a post titled: JSON versus XML: Is JSON Really Better than XML?. I already have an opinion on this issue, so why do I read this stuff? Anyway, I found my self getting a little annoyed. This afternoon, a wonderful warm sunny Friday afternoon, I started wondering why I was annoyed. Maybe there’s stuff to criticise in the article, but, aside from one of his comments, there’s nothing egregiously wrong in the article. But annoyed I am.

I think I know what the problem is. The author (Isaac Tayor) is trying to examine the suitableness of JSON and XML for use as a data format. He’s considering these criteria:

  • readability
  • conciseness
  • parsing time

Fair enough. Good idea even.

First we are presented with some XML:

If I recall, I’m already annoyed. Then we are presented what is purported to be equivalent JSON that he writes as:

From here on, his analysis assumes the equivalence of these representations.

Well. They aren’t equivalent. XML is a significantly more powerful format than JSON, and the power has a cost. There are alternative representations that are, in my opinion at least, more suitable. I believe the question should be if XML’s power pays off as a data representation, especially when compared to JSON. But let’s not use the heaviest, ugliest, form of XML that we can imagine (the only thing to make it worse would be adding in some namespaces, or maybe an embedded DTD). And this ugly format is what everyone seems to use, I don’t want to single out Isaac here.

Let’s try something a little nicer (but still not equivalent):

The difference is in using elements+content vs. attributes for, well, attribute data. Certainly fair in a dataformat. The description element is worth paying attention to since it’s not an attribute as I’ve written it. In XML the values of an attribute are subject to Attribute-Value Normalization, so I’ve found that it’s best to write text as content of an element rather than as an attribute value where whitespace matters. I’m assuming that whitespace matters in the description but not the other attributes.

So what have we got here?

Readability? Highly subjective but I’d say pretty comparable. I happen to prefer the XML, but maybe that’s because I’m used to it.

Conciseness? If we get rid of unnecessary whitespace (the indentation) then we’ve got:

File Bytes
book.json 358
book-nicer.xml 352
book.json.gz 269
book-nicer.xml.gz 269

That’s purely a co-incidence with the compressed sizes. But, what can I say? And you’ll notice that the XML is shorter, and it’d be even shorter still if we didn’t care about whitespace in the description.

Speed of parsing. Isaac’s benchmarking used Java with the built in XML parser and GSON from Google for the JSON parsing. I’ve got a feeling they aren’t quite doing the same thing. On top of that micro benchmarks on the JVM are really hard. See these related pages for more some of the issues and a handy library:

Since I’m doing this largely on a whim, and I’m at the moment somewhat interested in Google’s Go, and as it happens, especially it’s XML and JSON parsing. I’ve written some code main.go and bmark_test.go. Go has provided libraries to handle JSON unmarshalling and XML unmarshalling and decoding. I’ve put a bit of code into that main.go file to illustrate.

Update: I simplified the main.go a bit, the JSON and XML are now populating the same BookInfo struct.

XML decoding means more-or-less handling raw xml events. In the case of Go, the decoder is similar to a pull parser—you ask for the next event—rather than a SAX parser that pushes events at a handler that you provide. In both JSON and XML the Unmarshaller will actually stuff values into fields of structures. It’s convenient, but for the kind of thing that I do relatively rarely of interest. This is almost certainly not the usual preference among Go programmers, just mine.

Go provides a benchmarking capability as part of its testing facility, so I’ve added a benchmark using that. The outcome…

Benchmark Iterations Performance
JSON   50000 52142 ns/op
XML   20000 84950 ns/op
XML Decode 100000 29083 ns/op

They are all pretty fast.

So, at least in Go, XML is fastest if you Decode, slowest if you unmarshal, and JSON is in the middle. I also make all the caveats that have to be made with quickly constructed benchmarks. I hope that it’s at least indicative of something useful.

So in summary:

  • readability, we have our opinions, we can disagree here
  • conciseness: it’s not so obvious that JSON is more concise
  • speed of parsing: depends on how much help you want from your tools. XML’s tools are slower, as I’d expect given that they have to deal with a much more complex data model.

What I really wanted to get across here is that XML doesn’t have to be totally disgustingly ugly. And thanks Isaac for the excuse for a pleasant afternoon of mucking about :-)

 Posted by at 5:27 pm

Sorry, the comment form is closed at this time.