“It’s really all about future-proofing big data environments,” says Chuck Yarbrough, director of big data marketing at Pentaho. “As people continue to invest in being a data-driven enterprise and building out big data infrastructure, we pride ourselves on being able to future-proof these investments.”
Yarbrough says that the Pentaho 5.4 release focuses on three themes:
Deploying big data in the cloud
Being able to blend all data across the enterprise
Providing comfort and confidence to the customer about growing and scaling their Hadoop environment
Pentaho 5.4 allows customers to use Amazon EMR to natively transform and orchestrate data, and design and run Hadoop MapReduce in-cluster on EMR. That, in turn, gives organizations new options for how they can operationalize a cloud-based data refinery architecture for on-demand governed delivery of data sets.
“We’ve supported cloud deployment of Hadoop in the past,” Yarbrough says. “But now we’ve opened up the full ability to support an entire Amazon AWS instance. You can now push your data into EMR and then process that data at scale inside Hadoop with Pentaho Data Integration.”
Pentaho 5.4 also adds an interface from Pentaho Data Integration (PDI) into SAP HANA at the request of a number of its larger customers, as well as Hitachi. The integration enables governed data delivery across multiple structured and unstructured sources.
Big data at scale
Along with support for integration with Amazon EMR and SAP HANA, the Pentaho 5.4 release adds capabilities around big data orchestration and analytics at scale, all based on Pentaho’s Big Data Blueprints use case designs. The new capabilities include the following:
Integration of PDI with Apache Spark, enabling orchestration of Spark jobs
New APIs to simplify embedding of analytics into business applications and processes
The capability to localize Pentaho in French, German and Japanese
Pentaho 5.4 is immediately available. Yarbrough says he expects the 6.0 release to come near the end of the year.