When it comes to big data, it's easy to get excited about things like predictive analytics and prescriptive analytics that hold up the promise of radical change in the way organizations do business. But the guts of any production big data project—even the most bleeding-edge prescriptive analytics—is data integration.
"Data integration projects take too long," says Todd Goldman, vice president and general manager of Enterprise Data Integration & Data Quality at data integration specialist Informatica. "Part of the reason they take too long is because data integration projects start with a specification from the business, usually in the form of an Excel spreadsheet."
Manual Data Integration Is Like a Game of Telephone
That spreadsheet gets passed to developers who then do a great deal of work on the project before any validation gets done, Goldman says. It's like the child's game of Telephone. The spec gets passed on to someone who does a little work before it gets passed on to someone else who does more work and so on. Only at the very end do you check the output. It's a manual process rife with inefficiencies.
"Every time you request information to make a decision, your employees manually go gather data and roll it up," says Jared Hillam, EIM Practice Director with Intricity, an IT services company that specializes in data management.
"So, every time a request for data is made, you are reworking the data gathering and roll-up logic. The unfortunate part is that the process is error prone and slow—with the business complaining that it does not get the data it asked for," Hillam says. "Having specialized in the implementation of data warehouses, we know first-hand that there is a better way. What is needed is an agile data integration platform that involves the business user throughout the process, from requirements gathering, to requirements validation, to rapidly prototyping the solution, to analysis and profiling and then to testing. IT then can do the final deployment and on-going monitoring of the environment. With the business and IT now happy, it's a win-win for all."
Agile Meets Data Integration
In a nutshell, that's the idea behind the new Informatica 9.6 platform, which steals a page from agile development processes and brings it to data integration, allowing organizations to accelerate data integration projects by a factor of five compared with traditional processes, according to Goldman.
"Instead of using Excel and waiting to validate after the developer does a whole bunch of work, we're going to give the business the ability to do their own validation before it goes to the developer," Goldman says. "They can turn their specification into a real, working prototype. Then the developer adds scalability, reliability and performance."
The Informatica platform is already flexible enough to meet wide-ranging needs, from modest departmental data projects to large multi-node cluster Hadoop projects. With 9.6, Informatica has added a PowerCenter Premium Edition to its existing Standard and Advanced Editions. Recognizing that data integration is now supporting critical business processes, the Premium Edition adds the ability to monitor processes, get alerts and validate data integration jobs.
The entire package is powered by Informatica's Vibe Virtual Data Machine architecture, which is designed to make risk-free big data integration experimentation possible. Vibe is an embeddable data management engine that can access, aggregate and manage any type of data. Vibe gives developers the power to map data once and deploy anywhere—in any application and on any appliance or device—regardless of the data's format or whether it resides on-premise or in the Cloud.
Vibe separates the development environment from the underlying execution technology by modifying the way that logic is executed to accommodate different data types, computing platforms and consumption models.
Vibe's capabilities are designed to allow users to achieve the following:
- Automatically connect virtualized data integration prototypes to the physical world
- Embed data quality directly into an application
- Deploy on Hadoop without knowing Hadoop
- Create reusable integration workflows that can be built on premise and deployed to the cloud, or vice versa
"Clean, safe and connected data is neither a large data nor a large enterprise luxury," says Anil Chakravarthy, executive vice president and chief product officer of Informatica. "It is an every-enterprise imperative. We've built rapid prototyping and basic profiling into our platform. In today's data world, these capabilities are not a luxury, they're a necessity."
Thor Olavsrud covers IT Security, Open Source, Microsoft Tools and Servers for CIO.com Follow Thor on Twitter @ThorOlavsrud. Follow everything from CIO.com on Twitter @CIOonline and on Facebook. Email Thor at email@example.com