by Thor Olavsrud

Native Data Analysis Comes to MongoDB

Jun 24, 20144 mins
Big DataBusiness IntelligenceData Management

With the latest release of its business analytics and data integration platform, Pentaho claims to make it easier for organizations to apply analytics to their big data.

Seeking to make it easier for you to apply analytics to your big data stores, Pentaho today announced the general availability of the latest version of its business analytics and data integration platform.

The Pentaho 5.1 release is intended to bridge the “data-to-analytics divide” for the whole spectrum of Pentaho users, from developers to data scientists to business analysts. Pentaho 5.1 adds the capability to run code-free analytics directly on MongoDB data stores, incorporates a new data science pack that acts as a data science “personal assistant,” and adds full support for the Apache Hadoop 2.0 YARN architecture for resource management.


“The new capabilities in Pentaho 5.1 support our ongoing strategy to make the hardest aspects of big data analytics faster, easier and more accessible to all,” says Christopher Dziekan, executive vice president and chief product officer at Pentaho. “With the launch of 5.1, Pentaho continues to power big analytics at scale, responding not only to the demands of the big data-driven enterprise but also provides companies big and small a more level playing field so emerging companies without large, specialist development teams can also enter the big data arena.”

Data Integration Platform Enables Native Analysis of MongoDB Data

Previous versions of the Pentaho platform have provided the capability to integrate with MongoDB as a data source and provide reporting on MongoDB data. Now Pentaho is going a step further by enabling native analysis of data in MongoDB without having to go through an ETL process and with no required hand coding. MongoDB data collections can be analyzed directly at the source, reducing the time-to-insight as well as the need for specialist skills.

[Related: Pentaho Addresses Data Blending with Updated Business Analytics Platform]

[Related: Big Data Giants Cloudera and MongoDB Join Forces]

Dziekan points to healthcare costs solutions provider MultiPlan, which has nearly 900,000 healthcare providers under contract. It processes more than 40 million claims every year. Dziekan says MultiPlan takes the JSON source files from its portal and stores them in MongoDB. It uses the Pentaho Analyzer plugin, a drag-and-drop OLAP viewer, on top of MongoDB to slice-and-dice the data, creating dashboards and reports.

[Related: MongoDB 2.6 Keeps Pace in Database Speed Wars]

“Traditional RDBMS analytics can get very complicated and, quite frankly, ugly, when working with semi or unstructured data,” says Chris Palm, lead software architecture engineer at MultiPlan. “The Pentaho 5.1 platform is meeting market needs, allowing users to directly analyze data in MongoDB. We have seen more accurate results with new analyses and are no longer constrained by having to pull only part of our data. We can now look across a more full set of data and govern our system of record to gain greater insights.”

Data Scientists Get Personal Assistant

Pentaho has also added a new Data Science Pack to Pentaho 5.1 with an eye to making it simpler for data analysts and data scientists to rapidly build a 360-degree customer view blending data sources, like social and MongoDB. The pack adds an R script executor for Pentaho Data Integration (PDI) that allows an R script to be run as part of a PDI transformation, easing the burden of data preparation. It also adds a Weka scoring tool that allows users to apply classification, clustering and regression models constructed in Weka. And it adds Weka forecasting to help users leverage forecasting models created in Weka’s time series analysis and forecasting environment.

[Related: Splunk and Pentaho Bring Business Intelligence to Machine Data]

“The data scientist just got a personal assistant,” Dziekan says. “This Data Science Pack features tools data scientists are familiar with already and we’re now operationalizing them.”

The Pentaho 5.1 platform also adds full YARN integration, making it much simpler for developers working with Pentaho Data Integration to exploit the computational power of Hadoop without having to write complex MapReduce code. Dziekan says the YARN support allows PDI jobs to make elastic use of Hadoop resources, expanding and contracting as data volumes and processing requirements change. He notes that YARN’s advanced resource management capabilities support mixed workload scenarios where continuous data transformation and analysis is required.

Follow Thor on Google+

Thor Olavsrud covers IT Security, Big Data, Open Source, Microsoft Tools and Servers for Follow Thor on Twitter @ThorOlavsrud. Follow everything from on Twitter @CIOonline, Facebook, Google + and LinkedIn.