A reasonable probability is the only certainty – E. W. Howe
These days companies are much more likely to use statistics to help with planning than they did in the past, and I’m not talking about simple statistics. Complex time series statistics for forecasting can be used quite easily without having a statistician in attendance.
The reason we use statistics with far more abandon is that the combination of large amounts of data plus greater computational capacity makes data analysis quicker, cheaper and easier. No longer do you need a SAS consultant to do your number crunching, you can do it yourself on a PC.
The problem with making statistics more ubiquitous is that a user may not understand the assumptions that go into the statistics – and all statistical calculations rely on assumptions. The value of any statistical result has to be interpreted in conjunction with the confidence and probability of that result.
When using statistics for forecasting, the important thing is to understand how likely that forecast is. A forecast is based on data from the past, and relies on the assumption that the patterns of the past can be used to predict the future with some certainty.
A local meteorological event can be used as an example. The cumulative mean daily rainfall at my house in Johannesburg, for the summer and winter seasons, is shown – the line of concern is the red summer (wet) season.
This is the cumulative mean daily rainfall for each month – the focus here is on December.
According to these two graphs, the rainfall for mid-December should be around 225mm for the season, and 60mm for the month.
However, if we look at the month-by-month actuals versus monthly mean, and median, we can see how certain months vary widely about the mean and median; in statistical terms, the standard deviation around the mean changes. That indicates in some periods the mean and/or median are less useful for prediction than other periods.
When we look at the seasonal and monthly cumulative actual graphs, we can see how the real data is distributed.
Wet season cumulative actual:
December cumulative actual:
The interesting event mentioned above occurred this month. In the early part of the month, it looked like December was going to be a dry month. But unusual atmospheric conditions led to very heavy downpours on the 16th and 17th. As the monthly graph shows, we now have the wettest December since I started recording rainfall at my house in 1997.
For the seasonal graph, the drier than usual conditions this summer can be see by the arrow at the bottom. But all the heavy rain has done is to lift the seasonal rainfall to around the average for this time of the summer.
So while December is exceptionally wet, the season overall is as would be expected from the mean.
What do we learn from this in terms of analysis and prediction:
- be careful how you use the mean (sometimes use the median instead) and understand the variability around that point as described by the standard deviation;
- the nature and length of the data record is important, in this case December looks wet (for a 30 day period), but for a season (over 200 days) the rainfall is normal;
- analyse your data in different ways so you see alternative perspectives;
- discuss any forecasts in terms of probability of accuracy.
In the ERP world, the amount of data being stored is giving rise to more business intelligence (BI) and analytical tools. But because these tools can be applied by people without the requisite understanding of data analysis, the results and predictions from the analysis can be faulty.
If you are using a forecasting or optimisation tool for business planning, who is doing the analysis and do they understand the ramifications of how they use the tool?