Providing information security for a university is no easy task. Universities must serve large, ever-evolving distributed populations relying mostly on a bring-your-own-device (BYOD) model. Faced with such a daunting challenge, a number of universities are turning to Big Data analytics to tackle the problem.
The University of Texas at Austin, the flagship of the University of Texas System, is a prime example of the scope of the challenge. Its 350-acre campus features nearly 200 buildings, all linked by a 10 gigabit fiber optic backbone. At any one time, up to 120,000 individual devices—ranging from servers to switches, wireless access points, desktops, laptops, tablets, smart phones and security cameras—may be connected to its network.
"As with other universities, we have tens of thousands of users representing an even larger population of networked devices," says Cam Beasley, chief information security officer (CISO) of the University of Texas at Austin. "We have a constant need to identify anomalous user account behavior, detect, locate and quarantine compromised systems in real-time, and correlate events across multiple logging environments to more fully understand potential problems or threats."
UT Austin's Information Security Office (ISO) analysts used to rely primarily on intrusion detection/prevention system (IDS/IPS) appliances and custom developed software tools to monitor the problem. But it was slow and unwieldy; moreover, it didn't fully leverage the goldmine of data ISO had in the form of its log data.
"We wanted to plug into the many different servers and devices downstream that were coming under attack to correlate our network information with actual system log data," Beasley explains. "We didn't want a big, heavy SIEM [security information and event management] product because we hadn't had much luck with them in the past. We needed a more flexible system that we could adapt to our unique needs."
Jason Pufahl, CISO of the University of Connecticut, faced a similar problem.
"Ultimately, every time we needed to do any kind of data mining, it was half a dozen sources using a variety of different tools," he says. "It could only be done by one or two different people [who had the skills to do it]."
Big Data Analytics Helps Universities Mine Log Data
Like more than 275 universities around the world, UT Austin and UConn turned to Splunk.
"Universities have some of the most complex IT infrastructures in the world, and this makes them extremely vulnerable," says Mark Seward, senior director of security and compliance marketing at Splunk. "It's the ultimate BYOD situation. Security threats are constantly evolving. Splunk collects massive amounts of data and helps users detect unknown and persistent threats."
Splunk bills itself as a provider of real-time operational intelligence software. Essentially, Splunk is a Big Data indexing engine that collects, indexes and harnesses machine data generated by Web sites, applications, servers, networks and mobile devices. Splunk is the biggest in an evolving field that includes competitors like Sumo Logic, Loggly and LogLogic.
The idea behind Splunk came as Splunk co-founders Rob Das and Erik Swan were struggling with a Java application they were writing in 2003, Seward explains.
"They were finding a lot of errors in the application," Seward says. "They were looking at Java stack traces that were 100 lines long and fairly unstructured. It took a lot to go through these logs and figure out what errors there were and how to deal with them. Then one of them turned to the other and say, 'Hey, I wish we could Google this.' That's how it got started."
The initial use case was application troubleshooting, but security professionals soon saw that Splunk could give them the capability to make use of the reams of logs constantly generated by the sites, servers, applications and devices they had to monitor.
With Machine Data, the Only Limit Is the Imagination
Once an indexing engine like Splunk has access to that data, Seward says the only real limitation is the imagination of the user. He points to one CISO at a financial services firm that wanted to curtail tailgating-people following an authorized person into a facility without swiping themselves in with their badges.
By correlating badge swipe data, Active Directory login data and VPN use data, the CISO was able to determine whether users were working remotely or in the facility when they logged in, and then discover whether those who logged in from the facility had swiped a badge to enter. As an added bonus, if a user had not swiped into the facility but logged in locally, the CISO could now ask his staff to make a visual check of the user's desk-if the user was not actually in the facility, it was a good sign the user's machine was infected.
Power companies have also begun using their log data to gain better intelligence, Seward says. Smart meters now have remote shutoff capabilities, which could lead to illegitimate shutoffs.
"An insider may want to shut off someone's electricity for whatever reason," he explains. "I can pull in information from the billing system and compare the address where the shutoff occurred to the billing information. After all, it could be a billing error or someone who's looking to do harm to someone else. I can then add the GPS information from local utility trucks and maybe note that one of my trucks happens to be parked outside that particular house."
At UConn, Pufahl says the capability to organize disparate data sets into a central location and analyze it rapidly proved its importance almost immediately. Near the beginning of the semester, there was an issue with a primary course-related server that led to an outage.
"Splunk made troubleshooting it and visually describing what the problem was transparent immediately," Pufahl explains. "It stopped any amount of finger pointing. It was obvious who had to handle the problem and it was instantly apparent exactly when the problem occurred."
UConn Leverages Data to Improve Security Posture
Pufahl notes that the technology has helped his office make strides in implementing anti-virus capabilities on a university-wide basis.
"This sounds like a security best practice," he says. "In a corporate where you can manage it centrally, it's probably trivial. Here we have a transient population. It's very difficult to do."
But by using log data, Pufahl's staff is able to audit the environment and see where the trouble spots are, then generate reports and push them to the appropriate administrators to help them communicate with users who need to upgrade or install an anti-virus solution. Pufahl's staff has also used the capability to develop score every school, college and department on its security.
"We've developed what we're calling the University of Connecticut Security Score," he says. "We measure eight or so different security metrics with weighted values and produce that as a score-anti-virus, OS patches, a few other products that we expect to see running. Depending on the state of those, they'll be given a score and a corresponding report on how to improve that score."
"I think that, quite honestly, every organization is going to have to deal with making use of the valuable data that they've actually got in their institution," Pufahl adds. "It's not just a matter of disparate data on 300 systems. The minute you can take advantage of that data as a central collection, the questions you're able to ask of it really changes. We've found tremendous institutional benefit from being able to place of this data in a single repository and able to use it to make IT decisions."
Thor Olavsrud covers IT Security, Big Data, Open Source, Microsoft Tools and Servers for CIO.com. Follow Thor on Twitter @ThorOlavsrud. Follow everything from CIO.com on Twitter @CIOonline and on Facebook. Email Thor at firstname.lastname@example.org