Category Archives: Node.js

fluentReports gets an actual web site.

Hey,  I finally got a few spare minutes and spent those few minutes building a quick and simple web site for fluentReports.   However, I couldn't just do something totally "static" -- I had to do at least one page that was awesome.

The fluentReports web site now has a fully working demo page (with 3 of the sample reports pre-programmed in) that runs fully in your browser.    You can make any changes you want in the editor and it will attempt to run your changes and generate a new report if possible.

I had suspected the fluentReports would run fine in a browser; but I can now totally confirm that it is possible, with no modifications needed.   You just need to use browserify,

Check out the main site and let me know what you think...  http://fluentReports.com

fluentReport v1.0.0 - Released

It has been a couple years since the first version of fluentReports was released.   Over that time the engine has grown and some very complex bugs appeared that were difficult to fix.

I was very proud of the work at that point; but I have finally had the time to fix the complicated remaining issue in fluentReports and I am very proud to call this version 1.0.0.

Sizing issues with headers/footers and/or too long of text being printed are all squashed!   In addition it has a ton of new features like "link", always printing group headers on new pages; "fill" on prints and tons of other little goodies throughout the code base.     In addition the engine now is fully Asynchronous so you can use Async code in your header, footer or detail callbacks and not worry about the report being corrupt.    In addition it is smart enough that it can continue to be used like it was in Synchronous  mode.    No changes to your code.  I have attempted to maintain backwards compatibility as much as possible and only a couple minor breaking changes have occurred (which some of them can even be disabled, if you don't like the new default).   Please see the changelog file for the information on any breaking changes.

I have also created a commands.md file that you can view a outline of ALL the commands and what they do, and their parameters; this should fill in the documentation gap that I never had in place before.

 

 

Announcing a v8-Natives v0.0.1

What are v8-natives, you might ask?    

Well, they are the mostly undocumented javascript commands that control the v8 engine in Google Chrome, Opera and Joyent Node.js.      Some of the commands are %CollectGarbage(), %GetV8Version(), %GetOptimizationStatus() which ties with my other favorite of %OptimizeFunctionOnNextCall()

 

What can I do with them?

You can tell the engine to Optimize a routine, un-optimize a routine, never optimize a routine, ask it about internal data structures of an variable/object, and one of the most important items is ask if a routine is optimizable.

 

Why is this important?

Well, the v8 engine has several compilers built in; the lowest compiler is just a full featured javascript interpreter -- it is fast; but compared to one of the actual compilers it is so slow that molasses moves faster.     Do you want to figure out which of your code can be promoted to the faster compilers?   Do you want to see what code is a bottleneck even though at a glance it actually looks good?

 

So, which of these routines optimizable?

function sum1(a,b) {
try {
var c=a+b;
} catch (err) {
return -1;
}
return (c);
}

function sum2(a,b) {
return sum.call(this, arguments);
}

function sum3(a,b) {  return  sum(arguments);
}

 

Look no further:

Available on isle 15, at the deep discount of totally free; we now have all the tools you need to answer the above questions.    A fully working support library that wraps over 20 of the internal v8 native commands in a simple to use library that will not crash your script no matter if you have the v8 native support turned on or off.  Can be left in your app and deployed; and finally   supports both Node and Browsers.

Simple things like "v8.helpers.testOptimization(sum1);"  would tell you right away if the sum1 can be optimized....   Or v8.collectGarbage() will do a full GC before you run some timings on a performance critical code...   Lots of things to help your inner-performance surface.

You can get it at your local npm repository:  npm install v8-natives or check out the github page @ https://github.com/Nathanaela/v8-Natives

 

 

fluentReports v0.0.6 Release

We are happy to publish the latest version of fluentReports.

Major Features:

This version brings fluentReports back up to working with the latest and current version of pdfKit.   It also maintains backwards compatiblity with the older pdfKit versions.   As I move forward this compatibility maybe removed unless you let me know on the http://github.com/Nathanaela/fluentreports/issues page why you believe I should maintain compatibility.

This not only retains the compatibility with buffer, and write file and now adds the pipe support.     It has not been tested; but I believe fluentReports should work client side as the only code that would be "broken" is the "write to server file" type and you can't use that client side anyways.  If you test it client side and it doesn't work; let me know in the issues.

Minor Bugs Squashed:

  • Count could throw an error in the paged support.
  • Added outputType (supports buffer, pipe, and defaults to file)

 

Node & Browser Javascript Compression Update

I wrote a post on Data Compression back in October, http://fluentreports.com/blog/?p=18, discussing how I sped up a Data Compression Library that we have been using internally for all web socket traffic and how by combing techniques from different comparable libraries LZJBn.js was born.

Well fast forward several months --- I ran across another library that well professional curiosity compels me to to bench mark to see how well my cool LZJBn will trounce it.

Using 526 different sized files from real packets that we send:

Compression Decompression Compressed Size Original Size
LZJBn.js 0.503752017 0.1777535 15,890,401 37,345,189
node-lz4* 0.363441773 0.1109069 11,364,620 37,345,189

* - Node-lz4 does not compress files under a really small size; so there was 8 files comprising of a total of 182 bytes of data that was not compressed in this test on the lz4 side.  So because of this; when sending any data packets you will have to tag your packets as compressed or uncompressed.

And on even larger size using the ENWIKI file I used in the prior blog post:

Compression Decompression Compressed Size Original Size
LZJBn.js  1.87325003 0.279163174 34,332,875 50,000,896
node-lz4 0.713367233 0.207553456 27,591,715 50,000,896

Now if you look at the numbers it not only compressed and decompressed faster; but it also had a even better compression ratio.

After many tests and a very timely bug fix from the author of node-lz4 on a bug I reported; I have to shamefully say node-lz4 totally skunks my LZJBn.js module in the tests.   In addition Node-lz4 also has a native module for the node side, but those numbers aren't relevant to this test as this was purely testing the JavaScript library speeds.

So those who are wanting to implement as close to real time compression as possible using JavaScript; there is a now a new King of the Hill and sadly (for me) it is node-lz4.

Congrats Pierre; for a Job well done porting LZ4 to JavaScript -- and I know which library I will be using in the future!

For those who are interested the primary LZ4 site is here and the original author (Yann Collet) who created the lz4 compression format has a blog here: http://fastcompression.blogspot.fr/

 

Data Compression Revisited

Update: There is a relevant update for this in a new post.

Over a year ago; one of my co-workers bench marked several compression libraries and since then we have been using library called jslzjb by Bear.  This is on a un-released product and we currently use it almost constantly on a wide variety of devices and browsers to reduce the amount of data going over websockets.

Interestingly enough a couple months ago  Colt "mainroach" McAnlis wrote a very interesting blog "State Of Web Compression" where he did quite a few compression tests on different compression methods.  And in that blog post he referenced compressjs by Dr. C. Scott Ananian (CSA).   CompressJS is a fairly comprehensive javascript compression test library with several implementations of different javascript compression libraries (and results). So I made a note in our project tracker that someone on our team at some point in the future should check out CSA's version of LZJB vs the original that we are running since LZJB was showing up still as the fastest of the bunch on his tests.

So mid-last week; we discovered a bug caused by the compression library; if we turned it off -- it worked; if it was on; it caused issues only with apparently a couple characters.    I was tasked with the bug report and so I also took the opportunity to check out the newer rewrite of LZJB also since I was dealing in that area of the system and CSA's version might fix everything and be fairly drop in replacement.

But before we did so, we needed to see the speed increase or hit we would take.  So to make real world tests, I took Chrome. connected to my local instance of the our product; turned off compression and then promptly saved a couple "HAR with content" from the network tab->websockets and basically generated about 32megs of real transmission data doing a variety of things in our system.   Then I wrote a simple JS program to extract all the actual data packets into separate packets from the har file.  Created a couple additional files with the characters that were actually causing the problems; and then promptly added CSA's  test data which basically made over 37megs of test data across 526 different files.

From there, I wrote a very simplistic node test framework that read in every packet into memory; then ran each through a compression function (using the nano-second precision timer) and then ran it through the decompression function with the same timing.  Then just to verify compared the output buffer with the original to verify compression-decompression worked successfully and recorded the stats.  (For consistency; I load ALL the data first; run the tests on ONE compression library; and exit with the results for that library -- this should keep the memory footprint the same for every library and eliminate and gc hits beyond what the library itself causes.)

So my first attempt failed as Node reads things in as Buffers; and Bear's LZJB only works with Arrays and Strings.   So adding a quick toString() (outside of the timing) and I have my first timings; and a slew of failed files.  37,345,189 Bytes of Data; Compression was ~2.75 seconds, Decompression was ~1.29 seconds.  Not bad speed wise, but a tad over 50 of the files failed, that isn't good.

Next up was grabbing CSA's version and directly copy and pasted my test suite for it; and ran it.  Failed -- it didn't like strings; it wanted buffers or TypedArray's.   So I removed the ".toString()" and re-ran it; and got ~2.78 / ~0.63; and no failures.  I'm like not too bad a tiny hit on compression; but twice as fast at decompression.    But; I know this test isn't fair; Bear's does a String -> Array conversion that CSA's doesn't do.  And I know that String->Array converter is one of the slowest parts (from profiling it a while back).    So to make the test a bit fairer; I remove the .toString() from Bears harness; and modify the code slightly so that it will treat Buffers like Arrays.   And my new output is ~0.62 / ~1.25; but the same 50 odd files failed.

I'm like WOW; ~.62 seconds compression.  We now know the conversion hit is really killing us; so allowing it to use buffers makes it a considerably faster, nice win.  But 50 files failed; not good at all. And the decompression is still twice as slow as CSA's version.    So at this point, since I barely understand the routine and I do understand optimization, I decide I'm going to attempt to speed up something rather than "fix" something I don't fully understand.

I grab CSA's version, duplicated the "compress" routine and start messing with it.  I see a lot of what I considered "low" lying fruit; and a couple hours later my "new" version of CSA's is clocking at ~2.47 vs the original ~2.78; much better but still a far cry from the ~0.62 of Bear's compress.  Disappointing to say the least.

However, now I have a much better grasp of how the routine works; and realize that I would have to rewrite CSA's version to get any major speed up.    So I decide to go back to Bear's routine and see if I can fix it.   By looking at the original 'C' source; I can see a couple issues in the bear's conversion and correct them; and so now I am getting ALL my files passing with bears routine.   I also notice that Bears version is based on a older LZJB version so I upgrade the routine to use the newer hash (and a couple other tweaks).  Then to make a already long story much shorter; I spend the time to figure out why CSA's version of the decompress is so blasted fast and apply those techniques to Bears decompression routine.

So at the end of a couple days; the results are using the ENWIKI8 file (100,000,000 bytes):

Compression Decompression Compressed Size
Bears' Original* OUT OF MEMORY DURING COMPRESSION
Bears' Modified 3.092157 0.556772 68551699
CSAs' Original 10.96466 1.975028 67820737
CSAs - Modified 9.848296 1.975028 67820737

 

I then took the ENWIKI8 file and split it in basically half (so at least I could get a benchmark with Bears Original);

Compression Decompression Compressed Size Original Size
Bears' Original* 1.963242 1.904763 38204603 50,000,896
Bears' Modified 1.849097 0.280587 34332875 50,000,896
CSAs' Original 5.875302 0.992718 33966678 50,000,896
CSA's Modified 5.285134 0.992718 33966678 50,000,896

 

All 526 Files:

 526 Files (1k to 917k) Compression Decompression Compressed Total Size  Original Total
Bears' Original* 0.626847 1.258599 17250904 37,345,189
Bears' Modified 0.486729 0.177391 15890401 37,345,189
CSAs' Original 2.782399 0.636517 15709537 37,345,189
CSAs' Modified 2.413777 0.636517 15709537 37,345,189

* - Not technically Bears' original; this version supports Buffers and has the bug fix that allows it to compress all the files properly; no other bug fixes, enhancements or changes.

By using the New Hash; we caused our compression to be similar to CSA's (which he also uses the new hash).  In addition Bears' Modified now uses the same decompression idea as CSA; so it is now blazing fast for decompression.

So at the end of the a couple days work; we went from ~2.75 / ~1.29 down to ~0.49 / ~0.18; major win!

The funny thing is this didn't actually fix the original bug that we uncovered, it did however fix some other bugs we had patched around in our code (so now we can remove those patches).    The original bug was actually caused by Converting from a from a Array to a String on the decompression side.   On the conversion from a string to an array; we convert UCS-2/UTF-16 to UTF-8 encoded.   However, Bears code never had any converting code back from UTF-8; back into UCS-2/UTF-16 which is what JavaScript expects.   So all my Tests passed; if you read in a Buffer -> Compressed -> Decompressed -> Buffer.    But the minute you went String/Buffer -> Compress -> Decompress -> String; your data would be wrong if it had any UTF-8 encoded characters.   So by adding a UTF-8->UCS-2/UTF-16 on the other side on the toString conversion path we now have flawless (& much faster) compression and decompression.

Final Stats:

Megabytes Per Second Percentage of Compression
50m 100m 37m   50m 100m 37m
Bears 12.92679 N/A 19.80708 23.6% N/A 53.8%
Modifed 23.47808 27.4053 56.23253 31.3% 31.4% 57.4%
CSA 7.280249 7.728159 10.92311 32.1% 32.2% 57.9%
Modifed 7.96465 8.457858 12.24314 32.1% 32.2% 57.9%

So as of this moment; LZJB is still winning in speed; but it is now considerably faster than Colt's or CSA's website numbers show...  CSA still has a slightly better compression (even at the default level 1); but I would much, much rather have the extra 46 seconds over the minuet .5% file loss in size in our real data).

All these tests and numbers were done using Nodejs (10.2) -- Running a set of tests on the browser (not as comprehensive, but with several sized files) showed similar improved speed/compression results, under Chrome and Firefox.

I want to thank to Colt McAnlis for posting his article -- which led me to Dr. Ananian's compressjs and started the ball rolling on what has turned into a making a copy of LZJB running 22% faster during compression, and 86% faster during decompression with a 8% reduction is payload size!   Now our data moves faster to all of our devices, meaning the customer has gets his screens up sooner and that is why Performance Matters!

Updated Compression: LZJBn.js

Update: There is a relevant update for this in a new post.

Transforming JavaScript JSON

Colt McAnlis posted a very interesting blog post (http://mainroach.blogspot.com/2013/08/json-compression-transpose-binary.html) this evening on using Transposing to reduce the JSON data size; his post was right on the money.

We have been using a similar technique for a couple years now.  (Although, we use a different compression method over websocket as gzip is too expensive in pure JavaScript).

However, one thing that I commented on is that he went to step one, and step two gives him better results -- it actually improves the compression.

I created my own "original dataset" to show this example.   The Dataset has Spaces here in the blog and show it for formatting purposes to make it easier to read; but all my numbers are excluding spaces and returns as a raw json wouldn't have those in it.

The original Data (265 Characters):

[{Id: 1, Name: 'Nathan', Address: 'Somewhere', Country: 'USA', City:'Here', State:'OK',Zip:'55555'},
 {Id: 2, Name: 'Colt', Address: 'Elsewhere', Country: 'USA', City: 'There', State: 'CA',Zip:'44444'}
 {Id: 3, Name: 'You', Address: 'Not Sure', Country: 'USA', City: 'Where', State: 'AZ', Zip:'33333'}]

Colt's Transposing (211 Characters):
{'id':[1,2,3],
'Name':['Nathan','Colt','You'],
'Address':['Somewhere','Elsewhere','Not Sure'],
'Country':['USA','USA','USA'],
'City':['Here','There','Where'],
'State':['OK','CA','AZ'],
'Zip': ['55555','44444','33333']}

We transpose it into basically a JSON CSV (206 Characters):
[['Id','Name','Address','Country','City','State','Zip'],
 [1,'Nathan','Somewhere','USA','Here','OK','55555'],
 [2,'Colt','Elsewhere','USA','There','CA','44444'],
 [3,'You','Note Sure','USA','Where','AZ','33333']]

Now for every additional row of data we add with this dataset you add:

Original: 48 Characters of Static unchanging field definitions. (Ouch!)
Colt's: 7 Characters
Ours: 9 Characters

So how do we end up with better compression when after a dozen or so records our raw size is actually larger than Colt's?    Well; we only use [] and comma's.   He has added additional data to his data stream in addition to the [] and commas, he has  {}, and the colons.    By having more redundancy in our stream we compress better.

Wait; there is another easy savings if you think about the data...    Why send the header row?  If you already know the layout of what you are requesting; you can entirely eliminate the header row; which would then shrink your "raw" data down another 55 characters.  Meaning we start out at a small 151 characters.

So if you are dealing with straight raw characters; Colt's method actually is smaller (after about 30 rows) .  However, If you are going to compress the stream; the additional redundancy in our transformation appears to be better suited to make smaller compressed files.

Measure everything and think about how you actually use your data might be the difference in how you send your data making all the difference in how fast your app actually responds to requests because Performance Matters.

Announcing fluentReports

https://github.com/Nathanaela/fluentreports

Fluent Reports is a reporting Engine that is written for a project that should see widespread public use toward the end of the year.  But beyond that; mum is the word.   The Kellpro management has given me permission to discuss certain technologies we are using and open source some of the modules we have developed to give back to the community; just as we have used several open source libraries for our project.     You can find several modules we have enhanced and/or submitted bug reports that we are using in our github account.   But this is our first module that is completely 100% home grown by the developers at Kellpro, Inc.     Internally it is called the "REPORTAPI".   Since that is just so well, lame; I am giving it a new name for the world at large: fluentReports  (fR)!

There are a couple "minor" things in fR that are very specific to our project; they will currently remain in this code base as it is easier for us to maintain our project and keep this easily synced if I can do a diff/copy/paste from our internal system to this github repository.  So if you see weird things like the function "lowerprototypes" that seems out of place; well it is and maybe, just maybe someone will create a minor build script that removes that out of the minified version.

Features:

  • Completely Data Driven.  You pass in the data; you tell it easily how to print the data, and it generates the PDF report.
  • Headers, Footers, Title Headers, Summary Footers
  • Grouping, nested grouping, and even more nested grouping, ...
  • Auto-Summing (and other automatic totals like max/min/count)
  • Sane defaults, and the ability to easily override not only the defaults but pretty much every aspect of the report generation.
  • Images, Gradients, Text, Fonts, Lines, and many other PDF features supported.
  • Page-able data loading
  • Sub-Reports, Sub-Sub-Reports, etc...
  • Bands (Tables) & Suppressed Bands (w/ column wrapping or column clipping)
  • Free Flow Text
  • Ability to override each part of the report for total customization of your report
  • Fluent API
  • Ability to put data over images; gradients, etc.
  • Quickly generate complex reports with minimal lines of code.

We are using PDF Kit as the PDF generation library; and as such there is currently only one bug that we know about that we can't work around but hopefully should be rare and a open bug ticket has been submitted to PDFKit with the fix, so hopefully it will be fixed before you even get to play with the library.

Please note the examples are very simple; I've had the report engine working for about a year now; and kept meaning to release it.   Finally I got some "spare" time to polish up the examples a bit and to get the domain name running.  And git it actually committed to github.