by David Clarke

Big data: messy, difficult and valuable

Opinion
Nov 09, 20124 mins
IT Strategy

People have been talking about ‘big data’ for years, but the buzz has now intensified as more have begun to understand its potential and look for ways to exploit it for their organisations.

One of the big challenges for CIOs is to make use of the relevant tools, and to create a culture in which others appreciate how they can support the effort.

Big data is a collection of data sets too large and complex for regular database management tools, in volumes of petabytes (1m gigabytes), even exabytes (1bn gigabytes).

It makes it possible to measure human, business or scientific patterns in fine detail, and can provide highly valuable insights to support the development of products and services.

The potential to exploit big data is increasing along with the overall volume.

It can take in streams of data from sources such as digital sensors and cameras, which can track industrial activity and environmental change, and social media, which can provide evidence of people’s attitudes and preferences.

But it is very messy. As the volume of data grows it can be expensive to store, requires multiple servers for processing, and there is no ‘silver bullet’ IT solution.

There are software frameworks that can be used in managing big data, such as MapReduce, which supports developers in writing relevant programmes; but it still requires a lot of work to establish how the data should be split, valued then pulled together.

In addition, the process is aimed at extracting packets of information that have a high value for the organisation, and these are unlikely to align cleanly with the original structure of the dataset, and to come from only a small proportion of the total.

Some experts have pointed out that as more data is used much of it is duplicated – just think about data back-ups or tweets that are retweeted – and this reduces the proportion of extracted information against the total.

It is also necessary to convey the results in terms that make sense to business leaders. Data has to be presented in terms that are clearly relevant to the challenges and opportunities facing an organisation, and this requires the specialists to tell a story that others can understand.

To have any chance of exploiting big data successfully, you need people with the programme writing skills to identify the information amid the mountain of data.

A recent report by the Said Business School of Oxford, Analytics: the real world use of big data, suggests that organisations have been acquiring some of these skills: a worldwide survey of businesses and IT professionals showed that about three-quarters now have big data projects either in development or under way.

But it also suggests there are limits to their ambitions: less than a quarter had the necessary skills and resources to deal with unstructured data.

This is restraining them from trying to harness big sets of data from outside their organisations, especially from the more chaotic world of social media.

In the shorter term, CIOs will have to limit their ambitions to match the resources their organisations can afford; but it’s worth thinking about training programmes that could equip their programmers with the skills to extract information from data they obtain from outside.

At the same time they need to ensure that any big data project begins with a clear view of what the organisation wants from the process.

This is where it becomes important to have a clear understanding and a shared sense of purpose between the techies and the business managers.

A big step towards achieving this would be ensure that business directors and managers are data literate, with an understanding of where the insights can be found in data relevant to their activities, and what patterns would be valuable.

CIOs can make a difference in encouraging that data literacy, so that colleagues grasp the importance of existing data sets, then begin to think about what other sets could provide benefits to the organisation.

Senior executives need to think about their business issues and how they relate to data available from any source, even if it may be very difficult to collate and classify, then see what they can do about obtaining and using it to their benefit.

That can provide the driving force for the programmers to prove their worth.

This may be a daunting prospect, but it will often be the case that unstructured data is more likely to yield the genuinely fresh insights that can give an organisation an edge in its business.

CIOs and their colleagues need to think about how they can harness big data to obtain big advantages.

David Clarke MBE, Group Chief Executive Officer, BCS, The Chartered Institute for IT

Pic: avlxyzcc2.0