Pivotal releases a new version of its Pivotal HD Hadoop distribution built on the open source ODP big data kernel, along with a new cost-based query optimizer that promises up to 100x performance upgrades for Greenplum and HAWQ. At EMC World in Las Vegas this week, Pivotal rolled out enhancements to its big data suite, including major component updates to its Pivotal HD Hadoop distribution and up to 100x performance upgrades to Pivotal Greenplum Database and Pivotal HAWQ. Just a few months ago, Pivotal announced that it would open source its entire big data stack: the Pivotal HD distribution, Pivotal Greenplum Database, Pivotal GemFire real-time distributed data store, Pivotal SQLFire (a SQL layer for the real-time distributed data store), Pivotal GemFire XD (in-memory SQL over HDFS) and the Pivotal HAWQ parallel query engine over HDFS. These updates, says Michael Cucchi, senior director of Outbound Product at Pivotal, underscore Pivotal’s continued commitment to supporting that open source strategy. [ Related: Pivotal Looks to Simplify Building ‘Business Data Lakes’ ] SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe At the heart of the performance upgrades is the new Pivotal Query Optimizer, an advanced cost-based optimizer for big data, previously codenamed “Orca.” The optimizer allows users to make use of full ANSI SQL compliant queries against Hadoop. While some basic queries will execute faster in a standard planner, Cucchi says, the Pivotal Query Optimizer is the world’s most advanced cost-based analyzer when it comes to complex big data optimizations and will help customers manage ballooning data sets driven by mobile, cloud, social and the Internet of Things. [ Related: Hortonworks to speed Hadoop cloud deployments ] “We’ve advanced our analytical capabilities and our performance,” Cucchi says. “And also our configurability. We need to be able to very granularly control that optimizer.” He notes that users can configure Pivotal Query Optimizer down to the query level. While currently available as part of its Pivotal Big Data Suite subscription, Cucchi notes the Pivotal Query Optimizer will also be released to open source, probably in the next year or two. [ Related: Hadoop Platform, Apps Represent $1 Trillion Opportunity ] The new version of Pivotal HD is the first version of Pivotal’s Hadoop distribution that is based on an Open Data Platform (ODP) core. Pivotal, together with a host of other vendors, systems integrators and end users shepherded ODP into existence in February of this year in an effort to reduce the amount of complexity surrounding the Hadoop and big data environment. ODP is a big data kernel in the form of a tested reference core of Apache Hadoop, Apache Ambari and related Apache source artifacts. The idea was to create a “test once, use anywhere” core platform that would simplify upstream and downstream qualification efforts and eliminate growing fragmentation in the Hadoop market. Applications and tools built on the ODP kernel should integrate with and run on any compliant system. Since the launch of ODP, all major Hadoop distribution vendors have joined the effort. The new version of Pivotal features the ODP core, which consists of Apache Hadoop 2.6 and Apache Ambari. It updates existing Hadoop components for scripting and query (Apache Pig and Apache Hive), non-relational database (Apache HBase) and basic coordination and workflow orchestration (Apache Zookeeper and Apache Oozie). It adds the Apache Spark core and machine learning library, additional Hadoop components for improved security (Apache Ranger (incubating) and Apache Knox), monitoring (Nagios, Ganglia in addition to Apache Ambari) and data processing (Apache Tez). Follow Thor on Google+ Related content feature Mastercard preps for the post-quantum cybersecurity threat A cryptographically relevant quantum computer will put everyday online transactions at risk. Mastercard is preparing for such an eventuality — today. By Poornima Apte Sep 22, 2023 6 mins CIO 100 Quantum Computing Data and Information Security feature 9 famous analytics and AI disasters Insights from data and machine learning algorithms can be invaluable, but mistakes can cost you reputation, revenue, or even lives. These high-profile analytics and AI blunders illustrate what can go wrong. By Thor Olavsrud Sep 22, 2023 13 mins Technology Industry Generative AI Machine Learning feature Top 15 data management platforms available today Data management platforms (DMPs) help organizations collect and manage data from a wide array of sources — and are becoming increasingly important for customer-centric sales and marketing campaigns. By Peter Wayner Sep 22, 2023 10 mins Marketing Software Data Management opinion Four questions for a casino InfoSec director By Beth Kormanik Sep 21, 2023 3 mins Media and Entertainment Industry Events Security Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe