Update: There is a relevant update for this in a new post.
Over a year ago; one of my co-workers bench marked several compression libraries and since then we have been using library called jslzjb by Bear. This is on a un-released product and we currently use it almost constantly on a wide variety of devices and browsers to reduce the amount of data going over websockets.
Interestingly enough a couple months ago Colt "mainroach" McAnlis wrote a very interesting blog "State Of Web Compression" where he did quite a few compression tests on different compression methods. And in that blog post he referenced compressjs by Dr. C. Scott Ananian (CSA). CompressJS is a fairly comprehensive javascript compression test library with several implementations of different javascript compression libraries (and results). So I made a note in our project tracker that someone on our team at some point in the future should check out CSA's version of LZJB vs the original that we are running since LZJB was showing up still as the fastest of the bunch on his tests.
So mid-last week; we discovered a bug caused by the compression library; if we turned it off -- it worked; if it was on; it caused issues only with apparently a couple characters. I was tasked with the bug report and so I also took the opportunity to check out the newer rewrite of LZJB also since I was dealing in that area of the system and CSA's version might fix everything and be fairly drop in replacement.
But before we did so, we needed to see the speed increase or hit we would take. So to make real world tests, I took Chrome. connected to my local instance of the our product; turned off compression and then promptly saved a couple "HAR with content" from the network tab->websockets and basically generated about 32megs of real transmission data doing a variety of things in our system. Then I wrote a simple JS program to extract all the actual data packets into separate packets from the har file. Created a couple additional files with the characters that were actually causing the problems; and then promptly added CSA's test data which basically made over 37megs of test data across 526 different files.
From there, I wrote a very simplistic node test framework that read in every packet into memory; then ran each through a compression function (using the nano-second precision timer) and then ran it through the decompression function with the same timing. Then just to verify compared the output buffer with the original to verify compression-decompression worked successfully and recorded the stats. (For consistency; I load ALL the data first; run the tests on ONE compression library; and exit with the results for that library -- this should keep the memory footprint the same for every library and eliminate and gc hits beyond what the library itself causes.)
So my first attempt failed as Node reads things in as Buffers; and Bear's LZJB only works with Arrays and Strings. So adding a quick toString() (outside of the timing) and I have my first timings; and a slew of failed files. 37,345,189 Bytes of Data; Compression was ~2.75 seconds, Decompression was ~1.29 seconds. Not bad speed wise, but a tad over 50 of the files failed, that isn't good.
Next up was grabbing CSA's version and directly copy and pasted my test suite for it; and ran it. Failed -- it didn't like strings; it wanted buffers or TypedArray's. So I removed the ".toString()" and re-ran it; and got ~2.78 / ~0.63; and no failures. I'm like not too bad a tiny hit on compression; but twice as fast at decompression. But; I know this test isn't fair; Bear's does a String -> Array conversion that CSA's doesn't do. And I know that String->Array converter is one of the slowest parts (from profiling it a while back). So to make the test a bit fairer; I remove the .toString() from Bears harness; and modify the code slightly so that it will treat Buffers like Arrays. And my new output is ~0.62 / ~1.25; but the same 50 odd files failed.
I'm like WOW; ~.62 seconds compression. We now know the conversion hit is really killing us; so allowing it to use buffers makes it a considerably faster, nice win. But 50 files failed; not good at all. And the decompression is still twice as slow as CSA's version. So at this point, since I barely understand the routine and I do understand optimization, I decide I'm going to attempt to speed up something rather than "fix" something I don't fully understand.
I grab CSA's version, duplicated the "compress" routine and start messing with it. I see a lot of what I considered "low" lying fruit; and a couple hours later my "new" version of CSA's is clocking at ~2.47 vs the original ~2.78; much better but still a far cry from the ~0.62 of Bear's compress. Disappointing to say the least.
However, now I have a much better grasp of how the routine works; and realize that I would have to rewrite CSA's version to get any major speed up. So I decide to go back to Bear's routine and see if I can fix it. By looking at the original 'C' source; I can see a couple issues in the bear's conversion and correct them; and so now I am getting ALL my files passing with bears routine. I also notice that Bears version is based on a older LZJB version so I upgrade the routine to use the newer hash (and a couple other tweaks). Then to make a already long story much shorter; I spend the time to figure out why CSA's version of the decompress is so blasted fast and apply those techniques to Bears decompression routine.
So at the end of a couple days; the results are using the ENWIKI8 file (100,000,000 bytes):
Compression | Decompression | Compressed Size | |
Bears' Original* | OUT OF MEMORY DURING COMPRESSION | ||
Bears' Modified | 3.092157 | 0.556772 | 68551699 |
CSAs' Original | 10.96466 | 1.975028 | 67820737 |
CSAs - Modified | 9.848296 | 1.975028 | 67820737 |
I then took the ENWIKI8 file and split it in basically half (so at least I could get a benchmark with Bears Original);
Compression | Decompression | Compressed Size | Original Size | ||
Bears' Original* | 1.963242 | 1.904763 | 38204603 | 50,000,896 | |
Bears' Modified | 1.849097 | 0.280587 | 34332875 | 50,000,896 | |
CSAs' Original | 5.875302 | 0.992718 | 33966678 | 50,000,896 | |
CSA's Modified | 5.285134 | 0.992718 | 33966678 | 50,000,896 |
All 526 Files:
526 Files (1k to 917k) | Compression | Decompression | Compressed Total Size | Original Total | |
Bears' Original* | 0.626847 | 1.258599 | 17250904 | 37,345,189 | |
Bears' Modified | 0.486729 | 0.177391 | 15890401 | 37,345,189 | |
CSAs' Original | 2.782399 | 0.636517 | 15709537 | 37,345,189 | |
CSAs' Modified | 2.413777 | 0.636517 | 15709537 | 37,345,189 |
* - Not technically Bears' original; this version supports Buffers and has the bug fix that allows it to compress all the files properly; no other bug fixes, enhancements or changes.
By using the New Hash; we caused our compression to be similar to CSA's (which he also uses the new hash). In addition Bears' Modified now uses the same decompression idea as CSA; so it is now blazing fast for decompression.
So at the end of the a couple days work; we went from ~2.75 / ~1.29 down to ~0.49 / ~0.18; major win!
The funny thing is this didn't actually fix the original bug that we uncovered, it did however fix some other bugs we had patched around in our code (so now we can remove those patches). The original bug was actually caused by Converting from a from a Array to a String on the decompression side. On the conversion from a string to an array; we convert UCS-2/UTF-16 to UTF-8 encoded. However, Bears code never had any converting code back from UTF-8; back into UCS-2/UTF-16 which is what JavaScript expects. So all my Tests passed; if you read in a Buffer -> Compressed -> Decompressed -> Buffer. But the minute you went String/Buffer -> Compress -> Decompress -> String; your data would be wrong if it had any UTF-8 encoded characters. So by adding a UTF-8->UCS-2/UTF-16 on the other side on the toString conversion path we now have flawless (& much faster) compression and decompression.
Final Stats:
Megabytes Per Second | Percentage of Compression | ||||||
50m | 100m | 37m | 50m | 100m | 37m | ||
Bears | 12.92679 | N/A | 19.80708 | 23.6% | N/A | 53.8% | |
Modifed | 23.47808 | 27.4053 | 56.23253 | 31.3% | 31.4% | 57.4% | |
CSA | 7.280249 | 7.728159 | 10.92311 | 32.1% | 32.2% | 57.9% | |
Modifed | 7.96465 | 8.457858 | 12.24314 | 32.1% | 32.2% | 57.9% |
So as of this moment; LZJB is still winning in speed; but it is now considerably faster than Colt's or CSA's website numbers show... CSA still has a slightly better compression (even at the default level 1); but I would much, much rather have the extra 46 seconds over the minuet .5% file loss in size in our real data).
All these tests and numbers were done using Nodejs (10.2) -- Running a set of tests on the browser (not as comprehensive, but with several sized files) showed similar improved speed/compression results, under Chrome and Firefox.
I want to thank to Colt McAnlis for posting his article -- which led me to Dr. Ananian's compressjs and started the ball rolling on what has turned into a making a copy of LZJB running 22% faster during compression, and 86% faster during decompression with a 8% reduction is payload size! Now our data moves faster to all of our devices, meaning the customer has gets his screens up sooner and that is why Performance Matters!
Updated Compression: LZJBn.js
Update: There is a relevant update for this in a new post.