June 22, 2012

Is Big Data a big hype or not?

Don't tell me you haven't heard anything about Big Data. You definitely did -- it's a so trendy term today. Are we facing one more hype or not? I always thought that data is worth something only when someone knows how to extract value from it. So if you know how to do it then the more data you have -- the more value you get. And vice verse -- if you don't know how to extract value from small amounts of data, having 10 times more data won't help much. No magic here.

I personally know only 2 widely popular disciplines about extracting knowledge from data -- Data Mining (for nerds) and Business Intelligence (for more or less normal people). Let's check if their popularity correlates with popularity of Big Data. Google Trends will help us (headings are clickable):

Here everything is obvious -- popularity skyrockets. In start-ups' world people call this "hockey stick growth". Well, actually we didn't have to look into Google Trends to find it out -- internet is full of talks about Big Data nowadays.

What about Data Mining?

Hmmm... not that hilarious. Obviously it doesn't correlate with Big Data. Okay, these are nerds, they perhaps don't talk much to each other and to other people. What about Business Intelligence?

Like it or not, but popularity of Business Intelligence according to the most popular search engine steadily goes down. Does it correlate with Big Data? No.

So, is Big Data just one more hype invented by sales people or not? Am I missing something?

June 20, 2012

QViewer: What's next

Two weeks ago I made initial version of QViewer (my standalone QVD file viewer) available for public download. Since that time it was downloaded more than 300 times. As I myself am a practicing QlikView developer I tried to build a tool which I would use in my daily work. Therefore, a few more features were added. E.g.
  • Partial load (for dealing with large files)
  • Search
  • Pre-calculated statistical data (various counts for each field and each unique value)
  • Query tool for calculating simple aggregates (counts, sum, avg)
They make QViewer not simply a file viewer, but actually a data profiling tool which should help find data anomalies (unexpected nulls, text instead of numbers, duplicate rows, etc.) faster. I tried to automate some frequent analysis operations which usually require some time & effort when using regular QlikView. For instance, besides viewing QVD row-level data, with QViewer you can do a little bit faster the following:
  • Search a field for nulls
  • Filter values by type (text/number)
  • Count unique values
  • Count occurrences of each unique value
  • See immediately type of value (numeric/text/null)
These operations are pretty basic and make no problem for regular QlikView developer, but they take time because of their frequency. If QViewer can save you 15 minutes a day, it saves you 65 hours of working time a year (or more than 8 full working days). Multiply this number by your hourly rate and see the benefit in cash equivalence :)

Digging QVD format was an interesting exercise -- since it's a native QlikView format it gives some understanding how QlikView works under the hood, from what it started and how it has evolved into what we know today. I now understand better technical challenges with which QlikTech's developers faced and I have to admit that they've done excellent job resolving them. Exceptional QlikView's performance, which we know, is result of smart optimization and hard work. Also it looks like there was a portion of luck as some QlikView's key features were not designed as those from the beginning, but were result of evolution.

What's next for QViewer? As of today, I've implemented all essential features which I planned to make it useful for daily work. It will be remaining free until end of this year. Before end of this year a paid version to be introduced along with free limited version.

There still are some things to do, mainly because QViewer still doesn't perform as I want it to on large files due to limitations of standard .NET control (specifically DataGridView). I'm thinking about writing own custom control to replace it but this task may be very time-consuming. It would also be interesting to implement partial load with WHERE condition, as well as make Query a bit more advanced and, for instance, make it support conditions like where field is null.

I would appreciate to hear your ideas -- how to make QViewer better for your daily work. Please, leave your comments here. Thanks.

June 12, 2012

QlikTech acquired ETL-vendor Expressor: first impressions

Today QlikTech announced that they acquire Expressor -- an ETL-tool producer which introduced extension for connecting to QlikView files a few months ago. What does it mean and how does it work?

First, I'd like to congratulate QlikTech with extremely smart move -- acquiring a decent ETL tool was a long-anticipated step and finally it happened. Expressor itself is nothing extraordinary -- there're probably a dozen of similar tools on the market, but the point is not in Expressor itself -- QlikView long time was missing more visual, reusable and semantic way of performing ETL. Script-based ETL was one of major complaints of new users.

This acquisition also positions QlikView one level higher and closer to large enterprise market, however not very significantly -- big companies already possess enterprise ETL platforms like Informatica, Ab Initio, IBM Information Server (aka DataStage) so they would better prefer to have integration of QlikView with these tools rather than having one more. But at least they feel more comfortable dealing with "normal" visual ETL than writing loading scripts in proprietary language.

Now, let's talk about Expressor and how it works with QlikView.

Expressor is rather typical ETL tool with classic approach -- graph-like representation of ETL procedures where nodes are operators and links are data flows, drag-n-drop field mappings, etc. However, I liked 2 things about Expressor:

  • It looks like it compiles ETL jobs into binary code, which is very good from performance standpoint (this is similar to major ETL platforms mentioned above)
  • It's scripting language (Datascript) is an extension from Lua -- very popular and well-documented open-source scripting language, which is better than proprietary languages
Currently QlikView files are not "native" data sources for Expressor -- instead there is an "QlikView Extension" which adds Read QlikView and Write QlikView operators.However, we can expect that this will change soon.

The extension allows the following:
  • Extract field metadata (names, types) from QVW, QVD and QVX files, which can be used for mappings
  • Read data from QVX files
  • Write data into QVX files

As you see, neither QVW nor QVD are not currently supported as data source. They can only be used for extracting metadata. When I tried to load a QVD in Expressor it failed with error message "Qvd header was found instead of Qvx header". That leaves an open question --  will QlikTech add support for QVD files which are so popular among QlikView developers because of fast loading or they will stick to QVX only which have reputation of slow one?

Also current level of error-logging is not developer-friendly now. If something fails one should examine poorly readable extension logs. Hope this will change as QlikView migrate into Expressor's standard data sources list.

And the final remark -- Expressor Studio will be renamed to QlikView Expressor Desktop and will be available for free, including support for QlikView data sources. Free QlikView Personal Edition + free Expressor Desktop -- not bad combination for departments and small businesses, right?

June 4, 2012

Explainum QViewer - viewer for QVD files

I've developed a viewer for QlikView's QVD files -- Explainum QViewer. Associate it with .qvd file extension and view QVD files in one click.

It can be downloaded here for free.

Screenshot (click to enlarge)

The tool has been rebranded as EasyQlik QViewer.