Playing To Win
Singing the Data Analytics Blues
I get a seemingly endless flow of complaints about Data Analytics (DA for short), even though it is the hottest thing in business these days. Why isn’t DA giving me the insights that it is supposed to provide? We have a DA team now, but the little startups keep figuring stuff out before us and beating us? Why are the results of our DA so uncompelling? To address these baffling disappointments, I have decided to dedicate my 5th Year II Playing to Win/Practitioner Insights (PTW/PI) piece to Singing the Data Analytics Blues: It Just Ain’t What it is Cracked Up to Be. You can find all previous PTW/PI here.
The Fascination with Data Analytics in Business
Businesspeople love to have the definitive feel that DA gives them: “The data say that X is the right answer.” Then they can take the decision in question with total confidence. This is strongly reinforced by their time at business school, where they were taught that the only business decisions that are legitimate and properly made are ones based on DA. They are taught very explicitly that decisions made on ‘gut feel’ are only for old-fashioned losers!
Sometimes that DA does indeed provide a definitive and valid answer. For golfers, it is like stepping up to the tee box and hitting a long drive straight down the middle of the fairway. It causes us to believe that it will happen more often — and we are sorely disappointed when it doesn’t. As in golf where the perfect drive convinces us to keep trying, a great DA outcome keeps aficionados analyzing away, despite producing more disappointments than wins.
There are three main reasons for the disappointing results.
1) Lots of DA is Hypothesis-Free Data Mining
In much of business analysis, we don’t know what we are looking for. Much of that is caused by doing DA at the wrong time — before we have a clear hypothesis to test analytically. But we analyze anyway. I have made this point before with respect to SWOT, one of the most beloved analytical tools in the world of business. The hypothesis — if we can call it that — imbedded in the use of SWOT is that knowing our strengths, weaknesses, opportunities, and threats is important to the strategy of the company. That is assumed to be the case even though in the particular context of the company, SWOT fails to provide intelligent definitions strength, weakness, opportunity, or threat. That is particularly the case because SWOT is performed before we know what are strategy will be. That is why nobody can give me a great example of an insight from a SWOT analysis that has been really helpful to them. Nonetheless, millions of person-hours a year continue to be expended in hypothesis-free SWOT analyses.
Analysis is useful only to the extent there is a theory that is being tested. And that theory must be stated with sufficient precision that we can declare in advance, not after-the-fact, that the theory holds if we observe a particular and specific pattern of data. Instead, lots of data mining occurs and as they say, if you torture the data enough, it will give you something — just not anything useful.
2) Complexity of Business Overwhelms the DA Techniques that We Employ
The business world is full of myriad ambiguous causal relationships with multiple variables, and non-linear or circular relationships. Better treatment of customers produces higher customer loyalty which produces more data that helps figure out how to treat customers better, and so on. But which way does causality flow? In most business contexts, we can’t tease the variables or directionality apart with any certainty.
The most popular DA way to deal with this challenge is to ignore it. Assume that relationships are uni-directional and linear so that the DA is easier — we can get ‘clarity’ on one causal relationship using the statistical techniques that we all learned to come up with the magical R-squared. [As an aside, it is amusing to me to see the takeover of linear regression in my business lifetime. At the start of my career in 1981, if you showed a board of directors an R-squared, they would scold you for using high-falutin jargon. Now if you don’t show them an R-squared, they scold you for not having done thorough analysis. It is an example of the acquisition of a group of humans by a tool — i.e., the tool owns them at this point.]
The patron saint of DA is A/B testing. There is a whole infrastructure of experts within the tech giants and in A/B testing consulting firms who show the awesome ability of A/B testing to improve your business results. The secret to its success is that it simplifies the context down to one single variable — e.g., the color of the font, the placement of the ad, the use of triggering words. Either the A/B test produces more of what you want on the dependent variable, or not. It is beautiful! It can get customers to react more positively to a given stimulus in order to get them to do more of what is in our interests. Their interests are of no consequence in this use of DA.
But the question for all those A/B testers is whether their DA takes into account the causal relationship between A/B testing and how the how customers start to feel about you when they come to understand how you are treating them? No. That is excluded from the analysis because both that variable (trust) and that relationship (between customer and company) are hard to measure. As a consequence, the trust variable isn’t in the hypothesis — there is no row or column for that in the spreadsheet, no R-squared for it.
But being manipulated is precisely why the world simply doesn’t trust Facebook anymore. In the end, I predict that A/B testing will actually cost Facebook orders of magnitude more than the benefits it gets from this particular form of DA. The complexity of business frequently overwhelms the DA techniques that we employ — at our peril.
3) In Business, DA Can’t Predict the Future
As at the time of its analysis, all data is from the past. The goal of DA is to know something about the future. Thus 100% of DA is an act of extrapolating the past into the future. As explained by the father of DA — 4th century BC Greek philosopher Aristotle, who created the scientific method of analysis — DA works only when the past data is representative of future data. That only happens when the future is guaranteed to be identical to the past, which is the case with many attributes of our physical world — e.g., with respect to gravity, or the boiling point of water, or the rotation of the earth. But as I have pointed out before, the vast majority of the business world in no way features this characteristic. It is always changing. Thus, any sample of data fails to be representative of the universe that includes the future. Yet the DA aficionados keep analyzing the past with techniques that assume — utterly without reason — that the sample of past data will enable valid inferences about the future. Aristotle figured out that was a dumb idea 2500 years ago. One would have thought that the business world would have figured it out by now.
Worse still, doing DA will convince you that the future will be identical to the past — because that is the only thing DA can do. DA can’t tell you the ways in which the future will be different than the past, because it crunches data entirely from the past. This is a reason why all of those startups invent the future long before the powerful incumbents. It isn’t despite your DA: it is because of your DA. They imagine a future while you analyze the past. As American pragmatist philosopher Charles Sanders Peirce pointed out, no new idea in the history of the world has been proven in advance analytically.
That having been said, imagining the future is no free lunch. The overwhelming majority of those startups will fail completely and expire. However, the problem for incumbents is the same problem that opponents faced when defending against the Mongol hordes (as I have pointed out before). All it took was a few Mongols to survive long enough to breach your walls and your city was doomed. So while your DA reinforces in your mind that the future is identical to the past, the hordes keep coming.
Be careful to only use DA sparingly. Only if you are completely confident that the future will be identical to the past will the techniques that you use for DA be fit for purpose. And that is a heroic assumption that you should infrequently make.
Stop analyzing without a theory. Useful answers don’t just pop out of DA automatically. Correlations always do because if you have a big spreadsheet, things will appear related to one another. But that correlation is as likely to be incidental as it is to be causal.
Recognize complexity. Don’t wish it away in an attempt to produce a causal relationship — all else being equal. All else is rarely equal. It is an inherent part of the complexity of whatever you are analyzing. Simplification for the purposes of DA is too high a price to pay.
Work on your judgment. Consider all forms of information — both qualitative and quantitative, whether statistically significant or otherwise. Don’t analyze information — consider it. Roll it around. Attempt to develop understanding, not the answer.
When you analyze, pay significant attention to the outliers, not to the datapoints that fit the pattern. DA causes us to obsess about the regression line. The outliers are the key to a better model. Your job is to figure out a model of the world in which the outliers are right on the regression line.
Invest much more in genuine experimentation. If you leave territory open in which disruptors can experiment unopposed because you don’t have the data to justify investing there, you are getting down on your hands and knees, begging to be disrupted. Incumbents simply must invest more and not be cowed by their own DA.
Finally, never believe that you have the answer perfectly right because the DA tells you so. All that we can do in the complex world of business is to place bets with as much thoughtful consideration as possible. And it is always prudent to be ready to adjust as future data fills in the picture.