Transforming JavaScript JSON

Colt McAnlis posted a very interesting blog post (http://mainroach.blogspot.com/2013/08/json-compression-transpose-binary.html) this evening on using Transposing to reduce the JSON data size; his post was right on the money.

We have been using a similar technique for a couple years now.  (Although, we use a different compression method over websocket as gzip is too expensive in pure JavaScript).

However, one thing that I commented on is that he went to step one, and step two gives him better results -- it actually improves the compression.

I created my own "original dataset" to show this example.   The Dataset has Spaces here in the blog and show it for formatting purposes to make it easier to read; but all my numbers are excluding spaces and returns as a raw json wouldn't have those in it.

The original Data (265 Characters):

[{Id: 1, Name: 'Nathan', Address: 'Somewhere', Country: 'USA', City:'Here', State:'OK',Zip:'55555'},
 {Id: 2, Name: 'Colt', Address: 'Elsewhere', Country: 'USA', City: 'There', State: 'CA',Zip:'44444'}
 {Id: 3, Name: 'You', Address: 'Not Sure', Country: 'USA', City: 'Where', State: 'AZ', Zip:'33333'}]

Colt's Transposing (211 Characters):
{'id':[1,2,3],
'Name':['Nathan','Colt','You'],
'Address':['Somewhere','Elsewhere','Not Sure'],
'Country':['USA','USA','USA'],
'City':['Here','There','Where'],
'State':['OK','CA','AZ'],
'Zip': ['55555','44444','33333']}

We transpose it into basically a JSON CSV (206 Characters):
[['Id','Name','Address','Country','City','State','Zip'],
 [1,'Nathan','Somewhere','USA','Here','OK','55555'],
 [2,'Colt','Elsewhere','USA','There','CA','44444'],
 [3,'You','Note Sure','USA','Where','AZ','33333']]

Now for every additional row of data we add with this dataset you add:

Original: 48 Characters of Static unchanging field definitions. (Ouch!)
Colt's: 7 Characters
Ours: 9 Characters

So how do we end up with better compression when after a dozen or so records our raw size is actually larger than Colt's?    Well; we only use [] and comma's.   He has added additional data to his data stream in addition to the [] and commas, he has  {}, and the colons.    By having more redundancy in our stream we compress better.

Wait; there is another easy savings if you think about the data...    Why send the header row?  If you already know the layout of what you are requesting; you can entirely eliminate the header row; which would then shrink your "raw" data down another 55 characters.  Meaning we start out at a small 151 characters.

So if you are dealing with straight raw characters; Colt's method actually is smaller (after about 30 rows) .  However, If you are going to compress the stream; the additional redundancy in our transformation appears to be better suited to make smaller compressed files.

Measure everything and think about how you actually use your data might be the difference in how you send your data making all the difference in how fast your app actually responds to requests because Performance Matters.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.