by Thor Olavsrud

Pentaho Addresses Data Blending with Updated Business Analytics Platform

Sep 13, 20135 mins
Big DataBusiness IntelligenceData Management

Once you've addressed storing and visualizing your data assets, the next step in harnessing the power of big data is blending operational data with data from other sources. Pentaho is seeking to smooth the path for organizations looking to do so with its new Business Analytics 5.0 platform.

When the big data rubber hits the road, it’s about more than just storing massive amounts of data or even analyzing and visualizing a single stream. Gaining true insight from your data assets generally requires blending operational data and data from other big data sources together. Business analytics platform vendor Pentaho is striving to make that process easier than ever.

“True ‘big picture’ insights happen when operational data sources are blended with big data sources,” says Quentin Gallivan, CEO of Pentaho. “Companies that compete largely on service, in industries like telecommunications and financial services, see big data blending’s potential to help them gain market-share by providing the most personalized and interactive customer experience.”

This week, Pentaho unveiled Pentaho Business Analytics 5.0, a complete redesign and overhaul of its data integration and analytics platform that addresses data blending from the ground up and offers a new interface intended to simplify the user experience.

“What we’re seeing from our base is the need to make data more valuable by blending it with other data sources to provide insight,” says Rosanne Saccone, CMO of Pentaho. “Customers want to blend data not just at the glass and the desktop, but at the source.”

Business Analytics 5.0’s insight-driven dashboards are designed to provide top-line metrics delivered to desktops and mobile devices.

For instance, a telco might want to blend machine data about dropped calls with data from its data warehouse identifying its most valuable customers and their service level agreements (SLAs). This would allow the telco to then proactively target valuable customers that are not receiving agreed upon service levels with promotions and discounts.

Blending Can Be a Significant Data Integration Challenge

As Matt Casters, Pentaho’s chief of data integration, notes, data blending allows a data integration user to create a transformation capable of deliver data directly to other business analytics tools. Traditionally, data is delivered to these tools via a relational database. But that becomes challenging when dealing with massive volumes of data or when you just don’t have the time to wait until database tables are updated.

Addressing this issue often leads to hugely complex big data architectures with many moving parts: Hadoop clusters, NoSQL and traditional RDBS technologies, ETL tools, data marts, traditional BI tools and more.

Bringing it all together and giving users the capability to blend data with varying levels of data quality and granularity can be a significant challenge.

Business Analytics 5.0
Pentaho’s Community Tools let you track social engagement.

“The main problem we faced early on was that the default language used under the covers, in just about any business intelligence user facing tool, is SQL,” Casters explains. “At first glance, it seems that the worlds of data integration and SQL are not compatible.”

Casters says that DI requires reading from a multitude of data sources, such as databases, spreadsheets, NoSQL and big data sources, XML and JSON files, web services and more.

“However, SQL itself is a mini-ETL environment on its own as it selects, filters, counts and aggregates data,” he says. “So we figured that it might be easiest if we would translate the SQL used by the various BI tools into Pentaho Data Integration transformations. This way, Pentaho Data Integration is doing what it does best, not directed by manually designated transformations but by SQL.”

“In other words: We made it possible for you to create a virtual “database” with “tables” where the data actually comes from a transformation step,” he adds.

Pentaho Business Analytics 5.0 blends data “at the source,” which Saccone says maintains the appropriate level of data governance and security necessary for accurate and reliable analysis. The more commonly used method of end user blending “away from the source” lacks the ability to audit and cannot ensure correct inferences from the data, she says.

Pentaho’s platform also avoids the need to stage the data before blending, which often leads to out-of-date data sets.

New Features of Pentaho Business Analytics 5.0

Other features of the new platform include the following:

  • A new user console and streamlined user interface. Pentaho redesigned its user console to improve the user experience, allowing users to “easily browse files, create new content, quickly access recent documents, mark ‘favorites’ and more. “We did a lot of work in simplifying the experience for all the key players involved in bringing the value of data to light,” Saccone says.
  • A re-designed experience for administrators. The new user console also integrates an administrator perspective that gives administrators the ability to configure and manage security levels, licensing and servers.
  • Operational reporting for MongoDB. Pentaho has expanded the level of native integration between Pentaho Business Analytics 5.0 and the MongoDB NoSQL database, providing full support for MongoDB Replica Sets, Tag Sets and Read and Write Preferences.
  • Custom dashboards. Pentaho has added new custom-designed, insight-driven dashboards intended to give executives and managers a viewof top-line metrics delivered directly to desktops and mobile devices.

Pentaho has also added to the ease of data integration with a host of features, including these:

  • Up-to-date integrations and certifications for a large number of popular big data stores, including integrations with Splunk, Amazon Redshift and Cloudera Impala, and certifications that include MongoDB, Cassandra, DataStax, Cloudera, Intel, Hortonworks and MapR.
  • New capabilities to help IT manage huge data volumes efficiently, including capabilities like job restart, roll back and load balancing.
  • REST services for third-party applications developers, giving them the ability to embed analytics and reporting into web-deployed applications delivered ‘as-a-service.’

Thor Olavsrud covers IT Security, Big Data, Open Source, Microsoft Tools and Servers for Follow Thor on Twitter @ThorOlavsrud. Follow everything from on Twitter @CIOonline, Facebook, Google + and LinkedIn. Email Thor at