by CIO Staff

Braving the big data maze

Oct 11, 20137 mins
Big Data

Everyone’s talking about big data – structured and unstructured information that can be pulled together and shared to help organisations gain better insights and make more informed business decisions.

From sharing research and course information to analysing online habits of hundreds of thousands of students, Big data techniques have the potential to transform the way the education sector processes and analyses information from many sources.

IT leaders – predominantly from the higher education and research sector – gathered in Sydney recently to discuss the challenges around making sense of large amounts of data across their organisations to improve insights and create better products and services. The event was sponsored by Amcom and EMC.

Attendees were at various stages of big data projects and agreed that collecting quantities of structured and sometimes unstructured data onto a single platform in one location for analysis is a key challenge.

The University of Western Sydney’s (UWS) director of IT services, Kerry Holling, says the university’s research profile has improved significantly in recent years and big data, in terms of volume and unstructured information, is on his mind.

“We are recording every lecture and making them available for students to replay in their own time,” he says. “That’s 100TB of data on an annual basis, which is a challenge for us in terms of managing the capacity.”

But what’s important is gaining insights from that data. UWS has a particular focus at the moment on using predictive analytics tools to help improve student retention across the university.

“We have plenty of information about students who come in, where they come from, what sort of ATAR they have, the courses they do, the campus they are on and what might happen to them after a year or two should they not complete their degree at UWS,” he says.

“But what we are trying to do now is predict which students are at most risk of leaving university and putting in place some intervention strategies to retain those students and provide the support necessary to complete their degree.”

Holling says there are some potential privacy concerns, which the university is working through, but mostly students are happy to provide information to the university.

Study Group is in the midst of a student management system rollout across its colleges worldwide, says the organisation’s Asia Pacific IT services director, Will Calvert.

“We now need to harness the information generated by this system, and sources like and our other internal systems to do marketing and lead analysis for example,” says Calvert.

“For example, we may want to know that a student who landed on a specific web page or was part of a particular marketing program ended up being an excellent student,” he says.

Mat Myers, IT director at the University of Sydney, says the university is half way through an initial three-year business intelligence program to aggregate data from multiple enterprise systems.

This is the first in a series of intended programs and this specific initiative will significantly improve access to information about the university’s research, students, staff and overall performance.

“The first year was about getting the technology platform in place and now we are looking at sourcing information from our legacy systems to do descriptive analysis of our research performance, student and staff diversity, financial health, student demand….those sort of things,” he said.

Future phases of the program will see the university do more predictive analysis, asking more “what if” questions of its data around, for example, how specific actions would impact the number of students dropping out in the first year of study, Myers says.

The Australian Red Cross Society is currently undergoing a business transformation program with the first phase involving the deployment of a finance and retail system.

“As part of this, we have created an information management strategy around structured and unstructured data, which includes looking at how we optimise our data migration and ensure data quality is maintained into the future,” says the organisation’s head of IT operations, Veronica Frost.

When undertaking big data projects, organisations should take incremental steps from cleaning and storing data on one platform right through to completing meaningful predictive analysis, according to Michael Knee, chief operating officer at Amcom.

“The incremental first step, particularly in the university environment, is getting all the data in one place; a platform that is accessible and enables you to take the next step.

Amcom group executive, Richard Whiting, added that creating a systemised way of grooming and backing up data, should not be viewed as technology program but rather a change management program with buy-in from the necessary departments.

“The challenge is how to get the change management to happen; taking around a lot of structured data from disparate locations and placing it on a single platform for high level analysis that benefits the organisation,” he says.

Ensuring data quality and accuracy is key

Access to quality data is important, says UWS’ Holling. “For example, a really good predictor of our student retention risk is how many times students access our online student management system.

“So if a student hasn’t logged into an online course for six weeks then maybe a red flag should go up.”

However, there is a concern that UWS could be making incorrect assumption in respect to what the data is showing and wrongly embark on an intervention program that isn’t required.

“A student may be one that turns up for every lecture so they don’t need to log onto the learning management systems as often as others do who download course and lecture information online,” he says.

Keeping the data hoarders at bay

There’s also a cultural challenge around getting a handle of “data hoarding”, having users such as researchers and staff at universities, storing important data on USB devices and other external drives.

Researchers in particular, want to hang onto their data and aren’t prepared to put it onto another system, according to some attendees.

One attendee also highlighted a privacy aspect to centralising vast amounts of personal and in the case of educational institutions – research data.

“Running globally, we’ve had to comply with a whole bunch of national and state jurisdictions across different countries we operate, so sometimes we simply can’t centralise research and personal student data, even if we wanted to,” the attendee says.

“It certainly is a commercial discussion; changing the culture and dealing with different privacy concerns can sometimes put the handbrake on plans to centralise vast amounts of information for analysis.”

The University of New South Wales (UNSW) is currently working on providing a metadata layer that will enable easier searching for researchers and staff and increase data value and re-use.

Luc Betbeder-Matibet, director, faculty IT services at UNSW, says the goal is to improve the university’s research practice and output and provide researchers with long-term storage for their projects.

“It is important that this metadata layer sits on top of a big, safe, reliable store that can be accessed from anywhere,” he says. “The combination of providing a metadata tool to make the data ‘smarter’ and a location for storing it safely is what should encourage researchers to use a centralised store and reduce issues related to data hoarding.”

He says basic project metadata is collected through a data plan when storage is requested.

“This project-level metadata is associated with the research and provides a macro-level of metadata that we can use at the organisation level,” he says. “This gives us some information on the number of projects, which research areas they cover, how much storage is being used etc.

“More interesting, however, is that we also provide the researchers with a metadata tool for tagging up their own data at any point in the research cycle.”

UNSW is also creating a collection of metadata models which can be re-used within disciplines and across projects, says Betbeder-Matibet.

“The goal is to mature this into a service capability where the metadata tools are embedded more deeply into the data management practice of our research projects,” he says.