I’ve been thinking of Probability/Predictive analytics and how it relates to the value of data within an organization’s data center lately.
Probability theory began to be an established concept leveraging the idea of a coin flip. For example if you flip a coin 10 times, you can expect that the results, given a balanced coin a 5-5 split. But, if you were to get an 8-2 split, would that go outside the realm of predictability? Maybe, but only slightly. However, if you were to flip that same coin 500 times, you may get a very close metric of a 50/50 split. The meaning here is that the predictability really shows itself in a larger sample. In the mid 1600’s, Fermat and Pascal attempted to predict the outcome of that fifth coin toss, when having been required to end a game before it’s completion, and attempting to determine who WOULD be the winner. This scenario forced them to think about who would emerge as the winner, and therefore predict the outcome. This paper elucidates the logic they built around that idea.
Let’s say a basketball player’s field goal percentage is 70% from the field. At the beginning of a season, they may be much lower, or even higher. But, given the span of a season, the number of shots taken is much higher, and therefore a reasonable metric may be determined. Another good example would be that of a coast guard rescue mission. Let’s say a small craft goes missing, the way in which the search and rescue is approached takes into account tidal patterns, time, wind, and propulsion. In an effort to focus the efforts of the SAR team, these variables are taken into account, and thus the search can be more well-focused.
To me, it seems obvious that marketing data can leverage the same types of algorithmic approach to determining the predictability of ad response, click response, focused links, etc. can be achieved by viewing and extrapolating the historical data from these kinds of things as they’d happened in the past.
My thinking is that the data that exists within an organization’s framework of data acquisition is not being leveraged in the way it can be. The value of that data in so many architectures is not being used in the ways it could be in an effort to gain market strategy as it should be. What tools are out there to help to predict financial future, sales future, performance predictability, etc. from a given set of data? I like many database analytics platforms. However, in this world, so many of these databases are disparate. Some is held as machine data, some in various databases not necessarily in the same format, or even in different platforms. But there are tools from Oracle, Splunk and others that can take these datasets, no matter how disparate, given the correct mappings, that can assist in creating the kinds of predictive data analytics against these various datasets which will help to give that knowledge back to the user.
I find myself in these conversations relatively often. I think that the ultimate conversation needs to be one in which the customer has to realize just how much data is there and be willing to invest in the management and extraction of useful information out of that data. To be clear, the investment is not just one of the technology, but the time required to teach an administrator how to perform these tasks. I will be exploring these extraction methods in future postings.