by Peter Sayer

UCSD shifts from Cobol to real-time data streaming

Feature
Jan 30, 2019
Cloud ComputingData WarehousingERP Systems

At the University of California San Diego, CIO Vince Kellen is taking a modular, incremental approach to replacing Cobol applications with modern ERP systems in a quest to give analysts real-time access to data.

abstract code for Cobol and ASP
Credit: Thinkstock

The University of California San Diego has a Cobol problem.

Its three big applications for tracking finance, payroll and students have been running on mainframes since the 1990s. Their Cobol cores are getting old, and the accretion of bolt-on features is fragmenting data and processes and making them brittle and hard to maintain.

Cobol developers aren’t getting any younger, either: Five years ago, their average age was 55. UCSD has nine Cobol developers left, with retirement being the biggest cause of attrition.

So UCSD CIO Vince Kellen is moving those applications, and a dozen smaller ones, to the cloud, and at the same time building a new data warehouse on SAP S/4HANA to allow blended analysis of the information they generate.

He won’t be ripping the Cobol apps out and replacing them all in one go, though. Instead, he and his team are taking a more modular, incremental approach. Doing so will help UCSD take advantage of recent improvements in integration technologies, which ERP vendors themselves use behind the scenes when they make an acquisition, Kellen says. UCSD’s integration stack includes WSO2 API manager, Apache Kafka and tools from Informatica.

Moreover, universities like UCSD have so many specialized systems for tasks such as course registration, degree auditing, or research grant tracking that a modular approach puts more point solutions into play. “While the dominant providers, SAP, Workday, Oracle can potentially do all of those things, they don’t do all of those things equally well, so many institutions adopt others,” he says. “Modern integration technology is such that we can integrate it well, so we don’t have to force ourselves to do everything at once and get everything from one provider.”

The need for speed

But what about the problem of data fragmentation, which prompted the initial move away from Cobol?

“If you’re going to have a flotilla of products, of software-as-a-service solutions, you need a common analytical environment that’s above and outside of them,” he says.

For that, UCSD has turned to SAP S/4HANA, which the university runs on Amazon Web Services. One of the first elements that UCSD tapped, SAP Student Activity Hub, provides a window into learning across the university through analytical dashboards, tracing historical data on courses and degrees that students have completed over the past 15 years.

The Student Activity Hub was originally developed by the University of Kentucky, an early user of SAP’s HANA in-memory database, then transferred to SAP for commercialization. Kellen knows the product well, as he was CIO at Kentucky at the time.

Other Activity Hubs for employees and finance will follow in the next 18 months or so, with others for analyzing research and facilities management on the way too.

“The whole data warehouse environment has one purpose and one purpose only: To provide for analysts a delightful experience,” says Kellen.

One way it does that is through sheer speed. “It’s an in-memory high-speed system, so a billion rows are not a problem for it. It can do a top-to-bottom aggregation of a billion-row dataset in probably less than a second if I have an appropriately sized environment,” he says. “When I did this at the University of Kentucky, we saw that the need for downloading data evaporated quickly. A lot of people even wanted to move datasets up to the main analytical environment where they could get all the advantages of speed and the joining of data to many other pieces, and that’s what we’re hoping will happen here as well.”

The threat of defection

Rather than replicate data from one database to another, Kellen has a preference for consuming streaming data into the warehouse. “The nice thing about ingesting by stream is that in the future, we can inject machine learning or analytics right on the stream itself,” Kellen says. That could happen pre-ingestion, mid-stream or after ingestion, useful for model building.

Another advantage of streaming for Kellen is that “A little bit of queueing helps reduce the cost dramatically. It’s way more horizontally scalable than a pull API.”

Maintaining a copy of the data outside of the applications is also part of Kellen’s strategy for keeping costs down and vendors on their toes: It provides what he calls a credible threat of defection.

“It’s not that I can run the entire business out of the data warehouse, but the vendor knows that I’ve got our data from their system in another environment. I have an easier path to migration away from them,” he says.

You don’t have to have an actual defection capability, just a credible threat, he says. “I think it’s important because it keeps all the suppliers honest and competitive.”

While UCSD is working with SAP to build its data warehouse, it’s turning to Oracle for ERP. To replace its finance information system, it recently purchased Oracle Cloud Financials, and expects the new system to go live in 2020.

“We’ve gone cloud-first on all our software purchases for the last three years, so nothing new has gone in other than cloud. We’ve retired one data center, and we’ve got a second one that we’re retiring in another three years,” he says. “The only thing that will remain will be really old software that just won’t work in the cloud, and some forms of telephony which, if you take it to the cloud, it just gets worse from our experience.”

The move to the cloud is partly driven by cost. “People say cloud is more expensive but no, it’s not,” he says. “People don’t fully capture all the costs in their on-premises systems.”

Reducing costs — and risks

In a previous role, Kellen looked at those hidden costs. “When you add in the mortgage cost of your data center, the fire suppression cost, the UPS cost, the electrical infrastructure change over time and then the human upkeep, cloud is cheaper.”

Running applications in the cloud can reduce risk, as they typically benefit from a degree of fault tolerance thanks to the availability zones in the underlying cloud infrastructure. Providing the same level of fault tolerance on premises could double costs, he says.

One way to reduce costs still further, he says, is to use a cloud broker who will buy capacity at the cost of reserved instances, but resell it on an on-demand basis. UCSD works with London-based Strategic Blue.

There are some areas where Kellen is unhappy with the level of risk in the cloud. One is the ease with which cloud infrastructure providers can kick customers out.

“We’re trying to architect things to use containers so, if we need to, we can switch our cloud providers,” he says. That provides him with another credible threat of defection in case of price rises although, as he notes, “Competition now between Google, Microsoft and Amazon among others is keeping pricing reasonably stable.”

Another area of risk is the degree of liability cloud service providers will accept in the case of security breaches. “They’ve got all of our data, they’ve got all of our processing, but we still have a lot of the legal risk,” Kellen says. In his view, though, that’s mitigated by the generally higher level of security that they can provide compared to on-premises systems.

Kellen would like to see cloud infrastructure providers work with brokers and other third parties to find ways of balancing liability management in ways more favorable to the client.

He predicts that the issue of liability will be a focus for contract competition in future: “I’m hopeful that third parties will be able to handle that contract complexity.”