April 17, 2017

QViewer ver.3.3 is out

QViewer version 3.3 is now available for downloading. Here is what's new and exciting in the new version:

Support for QVD files larger than 2GB
Now you can open QVD files basically of any size. It was tested on QVD files up to 20GBs. Technically, at this point the only limitation is 2 billion rows per file. QViewer aggressively parallelize calculations so you may want to give it some time to perform necessary counts after opening a large file, because during that time CPU utilization can be near 100% which would make the application less responsive.

New Partial Load workflow
In the new version the workflow for partial load was significantly changed. The Partial Load button has been removed. Instead, when a file larger than certain threshold is opened QViewer suggests  performing a partial load (see the screenshot below). The threshold can be set at 512MB, 1GB, 2GB, 4GB or 8GB.

Indication of metrics affected by partial load
When a partial load is performed some metrics calculated by QViewer can be distorted because the loaded dataset is incomplete. To avoid confusion QViewer now shows which metrics are affected by partial load (shown red) and which are not.

Comments in generated script
When generating a LOAD statement, QViewer can now insert field comments obtained from the XML header. These comments are typically created using COMMENT FIELD or COMMENT TABLE statements in QlikView / Qlik Sense before exporting a table into QVD file.

Download QViewer

UPDATE 4/23/2017
I'm considering further improvements to QViewer. The ideas are floating around a few topics: table inspection for Qlik Sense Server, [shareable] selection bookmarks, support for other file formats (e.g. QVX, CSV, or XLSX), aggregation with built-in pivot tables (with selections applied). However, I'm not sure what would be the most useful for the Qlik dev community. I would appreciate hearing it from you -- what would you like to see in future versions of QViewer? Feel free to send me your suggestions (see my email in the upper right corner of this blog), or just leave a comment below.

April 9, 2017

Websites should offer downloading micromodels instead of CSV dumps

Many websites such as Google Analytics or PayPal allow users to download records (transactions) as CSV files or, sometimes, Excel spreadsheets. While it's more convenient than copy-pasting-parsing HTML from web-pages it's still not optimal because such dumps usually lack many details. For instance, when I download PayPal transactions I would also want to see product items, so that I could make a break down of sales by products. Or see customer addresses, so that I could analyze them from a geospatial perspective. Since the CSV dumps are denormalized, adding all the details and attributes would dramatically increase file sizes and also clutter them.

What I'm suggesting is that instead of CSV dumps websites should allow downloading micromodels -- user-related subsets of a bigger data model used by the web-service itself. Such micromodels would contain several linked normalized tables with only data relevant to the user who requested it. From a technical standpoint it can be one SQLite file generated on the fly. The SQLite file format allows packing multiple tables into one file which can hold millions of records.

Having a relational micromodel would allow more meaningful and interesting data analysis. It would play well with popular data analysis tools (except Excel which is poorly suited to work with relational data by design). Support for SQL queries would immediately make it compatible with vast amount of systems.

For information providers, an SQLite file with micromodel would be of size similar to current CSV dumps so it won't increase workload and traffic. Also generating micromodels can be even faster than generating CSV dumps since it won't require joining multiple tables in order obtain a denormalized view.

Below is an example of a possible micromodel schema for PayPal.