May 15, 2016

A peek into future Business Intelligence with AI and stuff

What a future Business Intelligence can look like? Usually, I'm skeptical about "disruptive" ideas like natural language queries or automatically generated analytical narrations (although, I respect the research effort), but recently I saw something that for the first time looked really interesting, if you apply it to data analysis. I will tell what it is shortly, but first I have to explain my skepticism.


Typing in natural language queries won't work because it's no better than writing SQL queries. Syntax is surely different, but it still has to be learned. It doesn't provide the expected freedom, like SQL didn't. Besides unexpected syntax restrictions (which has to be learned by user), queries quickly become long and complicated. I played a bit with NLP (natural language processing) queries done in Prolog in my school years and have a bit of understanding of the complexities related to NLP.

This can be somewhat mitigated by voice input, however virtual assistants like Siri/Alexa/Cortana are built around canned responses so it won't work either, because analytical ad hoc queries tend to be very different, and they always have a context.

Now, here is the promising technology. It's called Viv and I highly recommend watching its demo (it's about 30 minutes):



Two things that make Viv different: self-generating queries and the the ability to use a context. This can potentially make voice-based interactive data analysis finally possible. Not only can a service like Viv answer queries, e.g. "How many new customers did we get since January", you should be able to make it actionable. How about setting up alerts, like this: "Let me know next time when monthly sales in the West region drop below 1mln. Do it until the end of this year"? Or, sharing "Send this report to Peter and Jane in Corporate Finance department". Such virtual data analyst can participate in meetings, answer spontaneous questions, send out meeting results -- all done by voice. Quite attractive, isn't it?

Data analysis is a favorable area for artificial intelligence because it has a relatively small "universe" where entities (customers, transactions, products, etc.) are not so numerous, and their relationships are well understood. If you ever tried to design or analyze a conceptual data warehouse model, then most probably you have a good picture of that "universe".

And it seems like right technology to operate with this "universe" might arrive soon.

May 8, 2016

Hints and tips on using QViewer for inspecting resident tables in QlikView

Three years ago I wrote "How to look inside resident tables at any point of loading script". This technique proved to be quite successful and efficient, and was praised by many prominent QlikView developers since then.

This post is a round-up of some best practices of using QViewer for inspecting resident tables in QlikView, collected over the last 3 years:

Here is the most recent variant of the INSPECT subroutine:

SUB Inspect (T)
    // let's add some fault tolerance
    LET NR = NoOfRows('$(T)');
    IF len('$(NR)')>0 THEN
        // Table exists, let's view it
        STORE $(T) into [$(QvWorkPath)\~$(T).qvd] (qvd);
        EXECUTE "C:\<pathToQViewer>\QViewer.exe" "$(QvWorkPath)\~$(T).qvd";
        EXECUTE cmd.exe /c del /q "$(QvWorkPath)\~$(T).qvd";
    ELSE
        //Table doesn't exist. Let's display a messagebox with a warning
        _MsgBox:
        LOAD MsgBox('Table $(T) doesn' & chr(39) & 't exist. Nothing to inspect.', 'Warning', 'OK', 'ICONEXCLAMATION') as X AutoGenerate 1;
        Drop Table _MsgBox;
    ENDIF
    // Namespace cleanup
    SET NR=;
ENDSUB


Installer of the next version of QViewer will be creating a registry key with path to QViewer, so the subroutine will be able to use the registry key to get location of qviewer.exe instead of hardcoded file path (kudos to Matthew Fryer for the suggestion).

INSPECT is quite helpful in verifying joins for correctness. For this, insert CALL INSPECT twice -- once before a join, and once after it. This will allow you to see whether the resulting table has more rows after the join than before, and check if the join actually appended anything, i.e. if appended columns actually have some data in them.

To find duplicates in a column -- double-click the column header for a listbox with unique values in that column, and then click Count in that list. On first click QViewer will sort values in descending order thus showing duplicate entries (which have counts > 1) at the top of the list. Checking a primary key for duplicates after a join can help detecting wrong joins.

To find duplicate rows in a table -- click "Morph It" to open the table in EasyMorph, and then apply "Keep Duplicates" transformation. You can also filter rows, if you apply "Filter" or "Filter by expression" transformation.

When you deal with wide tables that have many columns, you might need to find specific column. Press F5 to open Table Metadata, and then sort field names in alphabetical order. Another common use case for Table Metadata is checking whether columns have values of expected type. For instance if a column is expected to have only numeric values, its "Text count" should be 0.

To find a value in a column -- double-click the column header to open a list of unique values, then use the search field above the list. To locate the searched value in the main table, simply double-click the value in the list. Press F3 to find next match in the main table.

Currently, the search feature is somewhat obscured (as rightfully pointed by some users). We will be introducing a more convenient full table search in QViewer v2.3 coming out in June. Subscribe to our mailing list on easyqlik.com to get a notification when it happens.

May 1, 2016

Why I prototype Qlik apps in EasyMorph before creating them

If you want to create a Qlik app just create it, why would anyone build a prototype in another tool first? Isn't it just a waste of time? For simple cases -- probably yes, but for complex apps prototyping them first allows designing apps faster and more reliably. Here is why:

When developing Qlik apps with a complex transformation logic one of the main challenges is to deal with data quality and data structure of poorly documented source systems. Therefore the most time-consuming phase is figuring out how to process data correctly and what can potentially go wrong. There are many questions to answer during this phase, for instance:
  • How do we merge data -- what are the link fields, what fields are appended? 
  • Does any combination of the link fields have duplicates in one or the other table? 
  • Do the link fields have nulls?
  • Are nulls actually nulls or empty text strings?
  • Are numbers actually numbers, not text?
  • Do text values have trailing spaces?
  • After we join tables, does the result pass a sanity check?
  • How can we detect it if the join goes wrong on another set of data (e.g. for another time period)?
  • Are dates and amounts within expected ranges?
  • Do dimensions have complete set of values, is anything missing?
  • When dealing with data in spreadsheets
    • Are text and numbers mixed in the same column? If yes, what is the rule to clean things up?
    • Are column names and their positions consistent across spreadsheets? If not, how do we handle the inconsistency?
    • Are sheet names consistent across spreadsheets?
These are kinds of questions you would want to have answered before believing that a transformation works correctly and reliably. Unfortunately, scripting in Qlik, like any other scripting, is poorly suitable for interactive data profiling. And that's where you start wasting your time. To answer questions like those above in Qlik you would need to modify and re-run script many times and every time Qlik will reload same data again and again. Even if it's a small test dataset in a QVD, it still takes time, or you risk profiling too little data. Oh, and if you mistakenly omitted a comma your app is dropped and you need to re-open it and re-run again. Then after each reload you would have to create some one-off layout objects and write some one-off expressions, just to get one question answered. Tricks like inspecting resident tables make life a bit easier, but just a bit. Overall, it's an inconvenient and cumbersome process. It effectively discourages data exploration, which sooner or later leads to errors that are discovered on late stages, where the cost of redesigning and fixing errors is much higher than on earlier stages.

This is where EasyMorph comes in handy. First, it loads data once, then keeps it in memory, therefore, it doesn't have to be reloaded every time. And if you load sample data only one time, why not use a bigger data set which is usually better for data profiling? Not only does EasyMorph load data only once, it also keeps in memory results of all successful transformations. So if an error occurs, you continue from where it stopped, not from the beginning -- another time-saving feature.

Second, EasyMorph runs transformations automatically in the background after any change. It's like if you are writing a Qlik script, and while you're writing it after any change or new statement Qlik runs the script proactively, without requiring you to press Reload. Except Qlik doesn't do it. Basically, transformations to EasyMorph is what formulas are to Excel -- you change one and immediately see a new result of calculations, regardless of how many formulas/transformations it took.

Third, designing a transformation process visually is much faster than scripting. Some Qlik developers are exceptionally good at writing scripts, but even they can't beat it when a whole transformation like aggregation is created literally in two clicks. If one knew exactly from the beginning what a script should do then writing it quickly would not be a problem. It's the numerous iterative edits, corrections and reloads that make writing Qlik scripts long. Once I have designed and debugged a transformation logic in EasyMorph, scripting it in Qlik is usually a matter of couple hours, and it typically works reliably and as expected from the 1st run.

Another important advantage of prototyping Qlik apps in EasyMorph is that it allows creating a reference result. When you design a Qlik application off an existing Excel or BI report it usually makes the task easier because numbers in the old report serve as a reference you can compare against. However, if you design a brand new report there might be no reference at all. How can you be sure that your Qlik script, expressions and sets work correctly? There is a whole lot of things that can go wrong. Building a prototype in EasyMorph gives you that reference point and not just for the script, but also for expressions, including set analysis. In airplanes, measuring crucial indicators like altitude and velocity must be done using at least two probes (for each metric) that utilize different principles of physics so that pilots can be sure it's measured correctly. The same principle here -- "get another reference point".

I also found that designing apps in close cooperation with business users is more productive when the users have good understanding of how a transformation logic works. It's better explained by letting them explore a visual process in EasyMorph rather than showing totally cryptic (for them) Qlik scripts.

Resume: EasyMorph is a professional tool which can be used by QlikView / Qlik Sense developers to create robust and reliable applications faster by prototyping them first. I do it myself, and so far it works pretty well.