A blog criticising the hype around big data is something I have been thinking about for some time. There has been huge amount of discussion about big data, just about every major publication has covered it – e.g., Harvard Business Review blog, Forbes, MIT Sloan Management Review, McKinsey Quarterly, companies like SAP promote in conjunction with HANA, as does IBM, and others. Consequently I was reluctant to commit my thoughts to writing until I saw Dennis Howlett’s blog on ‘Big Data BS’, and Stephen Few’s site including ‘Big data, big ruse’. Since then I have been assembling sources to underscore R ‘Ray’ Wang’s comment that the hype around big data was reaching
the levels of SOA in the early 2000’s, cloud in the late 2000’s, and social in the past few years.
Traditionally, big data describes data that’s too large for existing systems to process
He goes on to describe the three common characteristics of big data.
Volume. This original characteristic describes the relative size of data to the processing capability … Overcoming the volume issue requires technologies that store vast amounts of data in a scalable fashion and provide distributed approaches to querying or finding that data.
Velocity. Velocity describes the frequency at which data is generated, captured, and shared. The growth in sensor data from devices, and web based click stream analysis now create requirements for greater real-time use cases.
Variety. A proliferation of data types from social, machine to machine, and mobile sources add new data types to traditional transactional data. Data no longer fits into neat, easy to consume structures.
I have put together my own definition from various others I have found.
- Big data are extremely large volumes, of mainly unstructured data, which are streamed (not batched), and which require real-time analysis
My issue is not that big data is incorrect, but that it is relevant to only a section of the business community at the moment. A Forrester Research blog shows the businesses using big data are the ones that have always handled lots of data in the past – e.g., banks and insurance companies, telcos, oil companies, large retailers, and the international CPG companies. An analysis by McKinsey shows those industries where big data has high value vs. those where it’s value is low; for me the interesting observation was that while big data is relatively easy to capture, it has low value for manufacturing industries.
The same McKinsey article mentions that the analysis was done for businesses with over 1000 people. If you look at the US Census bureau’s 2010 statistics for business you will see that businesses of that size make up less than 9000 (0.15%) of the total of over 5.7 million businesses in the US. Why is there so much hype about something that only a small proportion of the market can use, or is interested in using?
How relevant is big data now for most organisations?
- the majority of organisations don’t have the data sources for it – whether from customers, manufacturing plant, operations, inventory;
- most manufacturers and distributors are risk averse and cost conscious; the new and ‘bleeding edge’ technology of big data, and its cost of implementation, wouldn’t appeal to them;
- those organisations outside the McKinsey study that have large data volumes can easily handle them using existing relational database (RDBMS) software. This was born out at during an interactive survey session at the ITWeb 2013 BI conference I attended where even though 63% of attendees were from companies with over 1000 employees, 67% were still using standard RDBMS technology.
As the technology for big data matures – e.g., Microsoft’ s in-memory/big data product codenamed Hekaton – it may find a larger market, but the fact is that for most businesses, existing technologies provide the capability to manage and analyse large volumes of data without having to get into new technologies.
What could make big data become more important to the majority of businesses? There are conditions that would make big data important.
- The information content gets above a certain critical threshold
- Operations are instrumented to create enough of a data deluge
- Senior management learn how to work with the new issues created
Conditions 1 and 2 could arise with the growth of the Internet of Things This refers to how more objects are becoming embedded with sensors and linked through wired and wireless networks, often communicating via the Internet. As these objects can both sense the environment and communicate, they become tools for understanding complexity and responding to it. This could create huge volumes of data that flow to computers for analysis and could have an impact on even small manufacturers and distributors.
But until that starts becoming a reality, the big data phenomenon is really something that only a few people need to be concerned about.