December 13, 2018

Cloud data warehouses on the rise because corporate users increasingly underserved by internal IT departments


When I was at the Tableau Conference 2018 in New Orleans (which we sponsored, btw) I had a chance to talk to other Tableau technology partners and finally started understanding the driving cause behind the proliferation of cloud-based data warehouses offered by various vendors (such as Panoply, Snowflake, or 1010data). Previously, I assumed that the main value proposition of cloud warehouses is cost reduction and the main target audience for such solutions are IT departments. However, it turns out that the main target audience is actually business (i.e. non-IT) departments and the main selling point is manageability and independence from the internal IT. Why? Because the internal IT is becoming increasingly inefficient in enabling business departments to achieve their goals. Business users are not happy with the pace of delivery, responsiveness, and the price tag provided by the internal IT. Up to the point when they start looking around.

This wasn't news to me, as I run a company that helps business users to get a degree of independence from IT. Rather it looked like yet another confirmation of a tectonic shift slowly happening in organizations.

In the vast majority of non-technology organizations internal IT departments have a supporting role and are typically viewed as cost centers, rather than revenue generators. Therefore, cost minimization becomes the driving force behind centralization of IT administration, development and procurement. On one hand, centralization allows achieve the economies of scale. On the other hand, centralization effectively creates internal monopolies. The larger is the organization, the more perceivable are the downsides of such monopoly. Imagine, that in a remote town with 50K population (size of a large corporation) there is only one computer shop (let's assume it's still the 90s and there is no Amazon) and no way to purchase anything computer-related from another town/city. What would happen to the level of service provided by that shop? With no competition, the prices will go up, and service quality will go down because a monopoly has no incentive to do it otherwise. The same happens with corporate IT departments -- they concentrate too much bureaucracy with no efficient feedback loop in place that would ensure constant improvement.

I believe, this trend has nothing to do with personal and professional qualities of people working in internal IT departments. Many of them have both skills and goodwill to do best for their internal customers. I suspect it's some kind of institutional problem that has its roots in traditional management thinking.

Although, there is some experimentation going on. I've seen an interesting attempt of a large bank to build its IT division in a form of a service supermarket for business users where business departments have the final word on what kind of business applications they need and when, and IT only takes care of security, maintenance and administration. Probably, the internal IT should ideally be the preferred vendor, but not the only option available to internal customers.

Analytical systems in general and data warehouses in particular are subject to the conflict of interests between the business and the IT side probably more than any other kind of enterprise system. They are predominantly demanded by business users, however they are very complex from a technical perspective. To makes things worse, data warehouses are frequently viewed as important but not strategically important to the organization. Which lets the implementation deadlines slip by months and sometimes years.

The idea of cloud data warehouses that target business users makes sense to me and signals about a bigger trend going on in the industry that aims to overcome inefficiencies created by over-centralization of IT.

September 14, 2018

How I came up with the idea of EasyMorph


One of the most frequent questions I'm asked is "How did you come up with the idea of EasyMorph?".

Throughout my career of Business Intelligence consultant I could observe how non-technical people work with [structured] data. Even 15 years ago, their setup was pretty much the same as it can be commonly found these days: an IT team generates various extracts from transactional and accounting systems, which then are loaded and processed in Excel by business users without a technical background.

All in all, it worked rather efficiently. The IT part of this setup was apparently automated as much as possible, while the business part remained entirely manual with the exception of a few occasional VBA macros here and there. While this approach had some drawbacks, it was assumed a reasonable compromise between those who are proficient in data manipulation but don't have an actual need for it, and those who need it but have limited technical skills.

However, eventually I started noticing that the complexity of calculations is growing. Loading a text extract and calculating a few simple metrics from it in Excel is one thing. But loading several files, fixing data quality issues, applying various business rules and calculating complex metrics is another thing. Complexity (as well as probability of human errors) increases particularly sharply when there is a need for merging (joining) two or more tables, because Excel doesn't do joins natively. Joins need to be emulated which is a not so trivial task in Excel, especially when you deal with datasets of varying length and potential problems with uniqueness of primary keys.

I guess, this growth of complexity can be explained by two simultaneous trends:

1) Slow explosion of data sources as more and more processes became automated. As software keeps "eating the world", more data collected, more metrics tracked, and more things monitored.

2) Centralized IT-led initiatives such as data warehouses couldn't keep up with this explosion. The task of consolidation of data coming from various sources has been shifting to a great extent from IT to business users, effectively becoming decentralized instead of centralized.

For me, the turning point was a project in a large North American bank (which later would become our 1st customer). In that project, I had to deal with an Excel report that was produced on a daily basis with 3 text extracts merged, various business rules applied and multiple metrics  calculated. Due to the complexity of calculations, the VBA macro that did all the processing was very obscure and had more than 1000 lines. The financial analyst who created it had moved to another country and was no longer accessible. There was no documentation on it. It took me a few weeks to reverse-engineer the business logic behind the script. A few errors were found that caused some metrics to calculate incorrectly.

There is no point in blaming the author of the report for the errors, obscurity, and the lack of documentation. After all, finance analysts are not supposed to do high quality coding/scripting. Financial analysts are supposed to do financial analysis, not software development.

For me, it became clear that there should be a better way of doing things. And then EasyMorph was born.

July 12, 2018

Data Preparation is the new Business Intelligence




What's now happening with Data Preparation bears a strong resemblance with the state of Business Intelligence in the late 90s. There is a number of similarities between the data prep market now and BI back then:

  • Emerging technology
  • Designed for the convenience of business users (which is very atypical for enterprise software, btw)
  • Introduces a degree of independence from IT departments (which also means a lower burden for the latter)
  • Low general awareness of this type of tool in the market
  • General underestimation of the proposed value ("why can't we just use Excel for this")
  • A rare "System-to-human" kind of enterprise application. Not "Human-to-system" (front-end) or "system-to-system" (back-end) software commonly found in the enterprise.
  • Designed and suited for mass user adoption, rather than for use by 1-2 people in a department or even entire company.
  • From a licensing perspective the total cost of ownership is within $40-100 per month per user.
Having this analogy, it may be possible to predict certain future trends for the Data Preparation industry:
  • User adoption will broaden slower than anticipated by Data Prep vendors. It will probably take 10-15 years to reach the "late adopter" stage and market saturation.
  • User adoption will be wider than first customers (early adopters) envision it.
  • There will be attempts for "Data Prep standardization" across organization but they will fail, just as BI standardization failed.
  • Enterprise administration/governance features will become necessary sooner rather than later.
  • Authoring will shift from desktop clients to web-browsers.
  • Expensive software (>$100/mo/user) will be squeezed out from the market by competition.
  • There will be a wave of consolidation when major Data Prep companies are acquired by big enterprise vendors.
For me, a confirmation of this similarity is that most of multi-user installations of EasyMorph tend to grow over time doubling in about 2 years, on average. We've recently seen a customer that requested 30 user licenses just for one business line. This tells me that although data prep tools are close to ETL systems from a technical standpoint, from a user adoption perspective they clearly resemble Business Intelligence and Data Discovery applications.

May 6, 2018

Why enterprise software is switching to subscription-based pricing



Business Intelligence and other enterprise software vendors are switching to subscription pricing en masse. Microsoft probably championed the shift when they introduced Office 365 and Power BI. Tableau recently announced switching to a subscription-based pricing model. At the time of writing this article the monthly fee for Tableau Desktop Professional was $70. Qlik and some other BI vendors have introduced subscriptions as well.

While software vendors are apparently pushing the trend, the customers sometimes have mixed feelings about it. On one hand, the significant reduction of upfront licensing costs makes rolling out new software deployments faster and with less risk -- you can start with purchasing only a few licenses and see how it goes. In the worst case, you just cancel the subscription instead of turning expensive licenses into shelfware.

On the other hand, in the long run subscription based pricing appears to be more expensive. In the previous pricing model, Tableau Desktop Professional cost $1999 paid once. I don't remember what was the maintenance fee, but for the industry the typical rate is 20-25% per year. If we assume 25% maintenance, in 5 years the total cost of ownership would be $3998 per user, while under the new subscription model the cost will be $4200. In a 10 year term the difference becomes even more significant - $6500 vs $8400 per user, at least on paper.

Is switching to subscription-based pricing just a marketing gimmick to squeeze more money out of customers? I don't think so, and here is why:

First of all, in a highly competitive market vendors can't squeeze more money from customers simply because a) competition won't miss a chance to undercut pricing, and b) the amount of money (market size) remains the same, no matter what pricing model is applied.

If so, why the change? As someone who runs a company that also employs a subscription based model, I believe the answer is sustainability.

The problem with the one-time pricing model is that it came from the times of industrialization, when the economy was based on physical, tangible goods. It's not the only business pattern that has been inherited from that epoch. The 8-hour workday from 9 to 5, and the need to commute to the workplace every workday have also originated from those times, because you know it was kind of problematic to do industrial-scale cast iron melting working remotely from home. Everybody needed to be at the factory and work hard with their hands.

Producing physical, tangible goods required lots of materials and some labor too. In the cost structure, the part of materials was typically much bigger than the cost of labor. Therefore, one-time pricing in such economy was logical because the cost was mostly driven by materials.

In the modern, post-industrialization economy some of the old models don't work well anymore. For many knowledge-based professions, such as software development, working fixed hours 9 to 5 or commuting to the office every day is becoming increasingly irrelevant, if not counter-productive.

A similar thing is happening with pricing models too. Labor by its nature is subscription-based, because an employee isn't get paid a lump sum of money upfront and then is expected to work forever without additional pay. Instead, s/he is paid a salary which is basically a monthly or weekly subscription to the worker's services.

In software development cost, labor comprises the biggest share. Therefore the expense structure for a company is predominantly of subscription nature. At the same time, having revenue structure that is based on one-time payments introduces financial instability and risk that needed to be offset by higher pricing and/or more conservative product development strategy.

With that in mind, switching to subscription model totally makes sense for software vendors as it allows to offset subscription-based expenses with subscription-based revenue and achieve better financial sustainability for the company. It also works well for the customers, as the vendors can now be less conservative in R&D spending which means that users will receive better products sooner.

January 27, 2018

Automation server for Qlik Sense

Qlik Sense is in many ways a more advanced platform than its predecessor, QlikView. Scalability, rich APIs, enterprise-level administration -- there are many features of a good architecture in it. However, what can be challenging for Qlik Sense customers (besides dealing with rudimentary data visualization) is automation. QlikView had embedded VBA scripting engine, which let designing automation scenarios initiated by users, but Qlik Sense doesn't have it. Disabled in the standard mode EXECUTE statement only aggravates the situation.

In this article, I'm proposing to extend Qlik Sense's capabilities with an automation server, based on EasyMorph Server. Such extension significantly simplifies a whole range of automation scenarios initiated by Qlik Sense user, that are difficult or non-trivial to implement otherwise. For instance:
  • Database writeback based on current selection in Qlik Sense.
  • One-click export of a subset of data from a Qlik Sense app into an external system or a disk folder.
  • Sending personalized emails with attached customized data extracts from Qlik Sense.
  • Downloading a particular file from a web-site and dynamically adding its content to a Qlik Sense app.
  • Automated data quality checks of incoming source data with rule-based email notifications.
The integration mechanism between Qlik Sense and EasyMorph Server is based on REST API and Websockets (see the diagram below):

Click to zoom

Actions are initiated by a Qlik Sense user by clicking a dynamically generated hyperlink, or an extension button. This triggers an EasyMorph Server task which runs an EasyMorph project with specified parameters (passed through the hyperlink or extension). The project performs required actions with external files and systems. Finally, the task status is reported back into the Qlik Sense app that initiated it. Alternatively, the task initiates a full or partial reload of the app using the REST API. 

A few benefits of such integration scheme:
  • In one Qlik Sense application there can be multiple action buttons that initiate different actions.
  • It works well with tiered Qlik Sense apps, where one app is for ETL, another for building a data model, and another one for the UI objects.
  • Closed loop feedback: task status and result are reported back to the user. If the task fails the errors will be reported to the user as well.
  • Task parameters can be assigned dynamically using Qlik Sense variables and expressions.
  • The action server can be hosted on a different machine thus reducing exposure of the Qlik Sense sever.
At this point we're half-way to implementing the integration scheme described above. This means that some of its elements are already in place, while other are in active development and will be released soon. As of version 3.7.1 already available:
The tool set described above is already suitable for adding automation capabilities to Qlik Sense apps. For instance database writebacks, extract generation or email sendouts are already possible by using dynamic hyperlinks in Qlik Sense applications. For better user experience and more advanced automation capabilities, a few more features are being developed and planned for release in version 3.8:
  • An interactive Qlik Sense app extension for triggering EM Server tasks and monitoring task status and errors in real-time.
  • The Qlik Sense Command transformation for triggering reloading Qlik Sense apps and QMC tasks right from EasyMorph projects.
  • Fetching emails and processing attachments.
With the addition of these features, the full integration scenario described in this article becomes possible. Besides that, EasyMorph will be able to work as a visual data transformation tool for Qlik Sense:

Click to zoom
In this case, a user triggers (through a link or extension) an EasyMorph task that generates QVD files (one file per one table in data model) and initiates reloading of the Qlik Sense app that called it. The app loads the generated QVDs.

If you would like to talk about about automation for Qlik Sense, send me an email (you can find my address in the upper right area of this blog or here).

To receive future updates on EasyMorph and its integrations with Qlik and other systems, subscribe to our newsletter on the download page.

January 4, 2018

EasyQlik QViewer acquired by Rob Wunderlich



Effective January 1st, 2018 EasyQlik QViewer has been acquired by Rob Wunderlich. 

I believe it's a great outcome for the product, its users and customers. It was a bit challenging for me to keep focus on QViewer and EasyMorph simultaneously, which resulted in a slower development pace for QViewer. It's hard to imagine a better new owner than Rob who is well known in the Qlik community and who surely has a great vision on what would make QViewer even more useful.

From now on, the existing licensed QViewer customers should contact support@panalyticsinc.com for all questions related to QViewer. The website http://easyqlik.com keeps operating as usually.

I, from now on, focus solely on EasyMorph.

Read also Rob's statement on the acquisition.

October 19, 2017

Tableau Maestro vs 3rd Party Data Prep Tools

If you haven't seen Tableau Maestro -- you should. I've seen the demo shown at the Tableau Conference 2017 and it's pretty cool (sorry, can't find a publicly available video). It's obvious that someone from the product management team has done a good job trying to address common challenges of data preparation (such as incorrect joins) in a visual way. Of course, Maestro is still in its infancy, but its introduction raises interesting questions. First of all, what does it mean for 3rd party data preparations tools, that target Tableau users?

Tableau Maestro. This screenshot belongs to tableau.com

Before I go further let me classify the existing data transformation offerings:

Personal Data Preparation Tools
These are rather simple applications that allow performing basic operations such as cleansing, filtering, merging in a linear and non-parameterized way. While they're visual and target non-technical audience their applicability is usually pretty limited as they don't support non-linear workflows (a must have for anything non-trivial), have no means of automation and integration (e.g. running external applications) and have a limited set of available transforms. On the positive side, they're usually reasonably priced and easy to start with.

Departmental Data Transformation (ETL) Applications
Applications in this category are full-featured, rather capable ETL programs that allow designing non-linear workflows (that typically look like a block diagram where inputs and outputs of blocks are connected with arrows), integrate with external applications, and run parameterized tasks on schedule. They are way more capable than the personal data prep tool described above, while still remaining rather affordable. However the vast majority of them have one big flaw that renders them barely useful for Tableau audience -- they are too IT/DBA/SQL-centric and therefore simply not suitable for an average Tableau user. Unless s/he wants to dive into topics such as the nuances of differences between CHAR, VARCHAR and NVARCHAR data types on a daily basis (hint: it's not much fun).

EasyMorph, the data transformation tool I've designed and produced, technically also belongs to the Departmental ETL category. However unlike most ETL tools, it's designed from the ground up for non-technical users first of all, which required walking away from the traditional approach to ETL and re-thinking data transformation from scratch.

Enterprise ETL platforms
These are mastodons. In terms of features, scale, and of course, price. Big guns with a big price tag. Most of them are also heavily IT-centric, however some Enterprise ETL platforms (e.g. Alteryx and Lavastorm) have managed to become closer to non-technical users than the rest of the group. The exorbitant cost of licenses in this category severely restricts number of people that can use it for self-service data transformation within an organization. Especially, taking into account that in many cases they are used for departmental (and even personal) ETL, not enterprise, which is clearly overkill. After all, having 75-80% of revenue reinvested into sales and marketing allows hiring very skilled sales people :)

Now, where does Maestro fit in this classification? While it's still in beta, and no final product has been demonstrated yet I probably wouldn't be terribly off base if I assume that Maestro is a personal data preparation tool (probably with a non-linear workflow). Which means that Maestro, once released, would leave very little room for 3rd party software vendors in this category, especially if offered for free. Many will have simply to leave the market.

OK, what about EasyMorph then? I believe Maestro is a good thing for EasyMorph. While some our potential users might not realize at first that the two tools are in different categories, the introduction of Maestro actually makes EasyMorph a big favor:

1. It proves that good data analysis requires good data preparation. Tableau developers are incredibly creative people. It never ceases to amaze me what kinds of hacks and workarounds they use in Tableau in order to bring the source data into necessary shape. However, in many cases a decent non-linear data transformation tool would make this task straight forward, maintainable and debuggable.

2. It introduces the idea of a dedicated data transformation tool for wide audience. When pitching EasyMorph to various organizations I noticed that the idea of a specialized data transformation tool is not familiar to the non-technical audience. Developers understand the idea of a dedicated ETL tool and the benefits such a tool can provide. But business users (who comprise the biggest part of the Tableau user base) usually have hard times understanding the whole idea of visual data transformation. Maestro solves this task for us. With the power of Tableau's marketing :)

Someone said, that it's Apple and Steve Jobs who have taught smartphone users to buy apps and music instead of pirating it from somewhere. Apple's AppStore and iTunes have changed the mindset. I believe that Maestro will discover to many Tableau fans the convenience and power of visual self-service data preparation.

3. It makes it easier for us to explain to Tableau audience what EasyMorph is.  Now it's plain simple: "EasyMorph is Maestro on steroids". The more people will use Maestro, the more people will buy into the benefits and convenience of visual programming (yes, Ladies and Gentlemen, it's visual programming), so that EasyMorph would be a logical next step for Maestro users once the complexity of required calculations grows beyond trivial.

PS. It's interesting to see that the "data kitchen" concept that I wrote about almost 2 years ago has been materializing more and more.