We have all read and heard the stories in the media, including mainstream business publications such as The Wall Street Journal and The New York Times, about how “big data” is transforming business, saving money for companies, and finding them new revenue opportunities, all through insights gleaned from data. So we have all heard the term big data, but what does it actually mean?
No longer just the realm of Google, Facebook, and Amazon, big data is the new norm for enterprise analytics and is pervasive across many industries: drug discoveries enabled by genomic research, real-time consumer sentiment, and social interaction for retail represent a small sample of business innovation derived from big data technologies and analytics. Whether it is fine-tuning supply chains, monitoring shop floor operations, gauging consumer sentiment, or any number of other large-scale analytic challenges, big data promises to make a tremendous impact on the manufacturing enterprise.
A simple definition would be that data becomes big data, or rather a big data problem, when the volume, velocity, and/or variety of the data exceeds the abilities of your current IT systems to ingest, store, analyze, or otherwise process it.
Volume: There are many examples of companies dealing with very large volumes of data. Consider, for example, Google’s support of 1.2 billion searches a day; the huge number of users registered and posting on Facebook; oil field seismic surveys that are terabytes in size range; or the large number of financial transactions a big bank system might process every day. In the future, these data volumes are bound to grow exponentially, as the “Internet of Things” becomes a reality, with some analysts predicting the number of Internet-connected devices to grow to 24 billion devices by the year 2020.
Variety: Traditionally, data managed by enterprises was mainly limited to structured data, i.e., the format of the data was well-defined, making it relatively simple to categorize, search, and analyze the data. However, what happens when the data is messy, such as data from a blog, or Twitter feeds, or data collected from a seismic survey of an oil field, or event data collected from millions of sensors embedded in an electrical network, or even text descriptions in emails sent to customer service? Would it not be valuable to be able to mine these unstructured data stores to identify patterns and search for meaningful themes?
Velocity: As embedded sensors in all sorts of equipment become ubiquitous, and the cost of mobile connectivity continues to drop, we can expect the speed at which data is collected to go up exponentially. For example, imagine every car sold by a car company sending frequent updates about the health of various sub-systems in the car, and millions of cars on the road continuously sending in updates; or an oil-drilling platform receiving continuous streams of data from the well; or the click-stream data for users of an application like Facebook.
It is easy to see how this simple definition of big data hints at some not-so-simple challenges. Certainly, there are solutions for handling large quantities of data. Networking and bus technologies provide a transport mechanism for moving data rapidly. But what happens when that data is a mix of structured and unstructured data that does not fit neatly into rows and columns, AND it is high volume, AND it needs to be processed quickly? Think of data from millions of sensors on electricity networks or manufacturing lines or oil rigs. Identifying deviations from past trends in this data (and whether the deviations are “safe” or “unsafe”) in real time can help avoid a power outage, reduce waste and defects, or even avoid a catastrophic oil spill. This type of problem can be found in almost all industries today. The volume of data is growing too fast for traditional analytics because the data is becoming richer (each element is larger or more varied), more granular (time intervals decreasing from months to days or days to minutes), or just needs to be processed much faster than it used to.
So we need tools to collect and store these massive volumes of data, analyze the data to find meaningful patterns and relationships, make predictions, and thus make the enterprise smarter through business insights mined from this deluge of data. That is the challenge of big data.