by Clint Boulton

How Hadoop helps Experian crunch credit reports

Jan 05, 2017
AnalyticsBig DataCIO

Experian is quickly crunching massive amounts of data and making it available to customers thanks to the open source software as well as microservices and API technologies.

credit report
Credit: Thinkstock

Experian has implemented a new data analytics system designed to shrink from months to hours the time it takes to processes petabytes of data from hundreds of millions of customers worldwide. The information services company is deploying the software, a data fabric layer based on the Hadoop file processing system, in tandem with microservices and an API platform, that enables both corporate customers and consumers to access credit reports and information more quickly.

barry libenson Experian

Experian CIO Barry Libenson.

“We believe it’s a really big game-changer for customers because it gives them real-time access to information that they would normally have to wait for as it was ingested,” says Experian CIO Barry Libenson.

Once an open source tool designated for piloting big data projects, Hadoop has become a necessary component of many analytics strategies as CIOs seek to make information-based products and services available to customers. The technology uses parallel processing techniques to help software engineers churn through large amounts of data more quickly than SQL-based data management tools.

Hadoop speeds up data processing

When Libenson arrived at Experian in 2015 he learned that the company was still processing data queries with mainframe systems. While enterprise data was growing at an exponential rate, software engineers were ingesting and processing data files piecemeal, normalizing and cleaning the information before turning it over to the business. They tackled new data management requirements by adding more MIPS. In an era where customers can order anything from shoes to computational power from with a few mouse clicks, Libenson knew that Experian required a data management strategy that was decidedly more frictionless and could parse data in real time.

As in many enterprises experimenting with new data tools, Experian business lines were toying with various shades of Hadoop, including Cloudera, Hortonworks and MapR, both in on-premises sandboxes and in Amazon Web Services (AWS). However, Libenson knew that if Experian was going to efficiently wrangle insights from data and deliver new products for millions of customers the company needed to pick one platform on which to standardize.

After some bake-offs, Libenson chose Cloudera as its primary platform. The  multitenant system runs on-premises in Experian’s hybrid cloud, though Libenson says the company had the capability to burst compute capacity using AWS as needed.

One early customer to benefit from Experian’s Hadoop data fabric is the Columbian credit bureau in South America. Thanks to Hadoop’s real-time processing capabilities, Experian processed 1,000 records in less than six hours compared to six months Libenson says it would have taken to normalize and clean the data using the company’s mainframe system, which processes only one record at a time. “The big deal for customers is that they know the data we have is as close to real-time as it can get instead of it being stale,” Libenson says.

With results like these you may wonder why more companies aren’t standardizing on Hadoop, which constitutes a modest but growing portion of the market for big data and business analytics technologies, which IDC says will top $187 billion in 2019.

The reality is that the software can be challenging to implement because it has been difficult to find engineers capable of working with the technology, whose parallel processing nature and knack for crunching unstructured information requires different ways of thinking about how to manipulate data.

“It’s a totally different way of writing and thinking about applications … you have to think of the fact that each node can fail,” Libenson says. “Most software developers who came up writing SQL code don’t think that way. Finding ones who know how to build stuff and architect it appropriately is the biggest challenge.”

Libenson says that college hires, recent grads, statisticians and data scientists, rather than seasoned database engineers steeped in the SQL world, tend to be conditioned to work with Hadoop. But given the cutthroat war for such talent, he says he pairs recent college grads and data scientists with SQL engineers to produce the best results from Hadoop.  

Now serving: microservices and API calls

As a result of Experian’s migration to Hadoop, Experian engineers can remove bottlenecks associated with preparing data to populate the company’s digital products. Banks, financial services firms and other corporate clients can also access credit reports and other products via Experian’s new API platform and microservices architecture in which application functionality is decoupled and loosely dependent. For instance, a financial service firm requesting a customer’s credit score or checking payment history on a credit card can make an API call through Experian to retrieve the data, rather than download and consume entire applications to access the information.

“We’re hearing huge demand that they want microservices to consume information as opposed to traditional on-premise apps,” Libenson says. “Every financial institution is moving to an microservices model, and an API mechanism is consistent with how they want to consume information.”

Experian’s shift to more modern and modular architectures — Hadoop, microservices and APIs — has also required an overhaul in software development from rigorously documented projects built in stages over several months to functionality delivered in bits and pieces. Libenson says his IT department has embraced agile and DevOps methodologies to build minimally viable products, test them and refine them as needed.

Moving to a hybrid cloud model, microservices architecture and an API platform constitutes “a big shift that helps Experian reduce errors, drive out costs and accelerate innovation,” Libenson says.