Big Data is one of those buzz-terms that is difficult to avoid. But where are we when it comes to actual adoption of those technologies?
To many, data becomes Big when you’re talking volumes ending in terabytes, petabytes, exabytes and beyond.
Others apply criteria such as a number of records, transactions, or files. If the number is mind-bogglingly large, then surely it must be Big Data.
But that’s missing a crucial point is it’s not just about quantity, but also about the nature of the data.
Just scaling up your existing systems to accommodate ever larger data volumes doesn’t necessarily mean you’re dealing with Big Data.
The Big comes into play when you’re looking at all the data that resides outside your existing, typically well-structured, systems and wondering how on earth you can harness it for business benefit.
Another definitional issue is structured versus unstructured data.
Categorizing structured data tends to be less contentious because this is data that exists in tabular form, typically in relational database management systems (RDBMS). Then there’s the rest.
To call it unstructured applies only in the sense that this data doesn’t reside in traditional RDBMSs.
But there are many shades of grey. For example, the data in system or web logs is usually very well structured, and each data element has a known definition and set of characteristics.
Similarly, data from a social media stream, like Twitter, is well-structured in some ways (for instance, it defines length of message and use of operators such as @ or #), and yet is totally unstructured in others (the content of the message could be anything).
Email, documents, spread sheets and presentations also fall into this category to a certain degree; it all depends on the context in which they are stored.
Then there are blogs, pictures, videos and all kinds of other data elements which your organisation may well wish to understand better but doesn’t yet capture within existing systems.
There’s certainly no getting away from data growth. A recent survey* conducted by Freeform Dynamics shows that most organisations are seeing data volumes increase, with unstructured data for many looking set to grow even faster than structured data, as illustrated in Figure 1.
But, are you big?
This is where the first checkpoint comes for an organisation wondering whether Big Data applies to them and what the potential importance of unlocking value, from the unstructured data it holds, might be.
This is regardless of whether such data is created or collected inside the organisation, or arriving through external feeds.
Most organisations have a strong gut feeling that there must be something in there but haven’t got the means to get at it.
It’s difficult to justify investing in technology and setting aside resources when you don’t even know what you’re looking for.
It may not even be clear which technology might be best suited to this data equivalent of panning for gold.
Lack of clear return on investment is one of the key reasons why few companies are extracting value from information held outside systems designed for handling structured data, as shown in Figure 2.
Developments in advanced storage, access and analytics may make the decision easier.
Together, these developments help to address Big Data problems, not just by being able to crunch lots of data, but also by doing so in innovative ways.
In many instances, they use commodity hardware and open source software, which can reduce the cost of entry, provided the relevant skills are available within the organisation.
But before any technology decisions are even considered, business and IT executives need to identify whether they’re really in Big Data territory in the first place, by applying the Three Vs test: volume, variety, and velocity.
As described in Figure 3 below, the Three Vs together capture the essence of Big Data.
This is of course just an artificial construct, albeit a very useful one that is becoming increasingly accepted.
How an organisation applies the definitions and criteria is, of course, a matter for discretion.
What’s most important is having — or gaining — a clear understanding of the organisation’s data landscape, how that is used today and whether there is a reasonable chance that further useful information might be extracted, above and beyond what’s already being done.
* Online survey conducted during first half of November 2011; 122 responses. Organisation size ranging from under 50 employees to 50,000 and above. Respondents mainly IT professionals, with 55 per cent based in the UK, 17 per cent in the rest of Europe, 16 per cent in the USA, and 12 per cent in the rest of the world
Martha Bennett is VP, head of strategy of Freeform Dynamics
Pic: Erik Charltoncc2.0