Seeking to eliminate the need to manage schemas and perform time-consuming ETL tasks on incoming data before exploring it, MapR is adding the Apache Drill distributed ANSI SQL query engine to its Hadoop distribution. Aiming to eliminate a number of onerous data engineering tasks, MapR today updated its distribution of Hadoop to include Apache Drill 0.5. Drill is an open source distributed ANSI SQL query engine for self-service data exploration — an open source version of Google’s Dremel system for interactively querying large datasets, which powers its BigQuery service. The stated goal of the Apache Drill project is to make it able to scale to 10,000 servers or more while processing petabytes of data and trillions of records in seconds. The Drill query engine provides the capability to do the following: SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe Explore data in its native format (including Parquet, JSON files and HBase tables) without intervention by a database administrator (DBA). Analyze evolving and semi-structured/nested data from NoSQL data stores like MongoDB and online REST APIs. Create queries that simultaneously combine different Hadoop data sources such as files, HBase tables and Hive tables. Reuse existing SQL skill sets, BI tools and Apache Hive deployments. [Related: MapR Extends Hadoop’s Reach With Big Data App Gallery] “We’re excited about this because it really opens up a new era for SQL-on-Hadoop,” says Jack Norris, chief marketing officer at MapR. “The focus in on self-service data exploration on Hadoop that doesn’t require IT involvement.” Because Drill provides the capability to run SQL queries directly on various formats, it can be used to explore live data as it arrives, without weeks spent preparing and managing schemas and setting up ETL tasks. In this way, it provides instant, self-service data exploration across multiple data sources. [Related: MapR’s New Hadoop Distribution Promises No-Risk Upgrade] “Organizations want to provide access to data stored in Hadoop and NoSQL databases to a broader set of users with existing SQL analysis skills,” says Matt Aslett, research director, data platforms and analytics, at 451 Research. “Apache Drill’s ability to provide access to data in Hadoop without the need for centralized schemas and also NoSQL datasets with complex data structures including nested and repeated fields differentiates it from traditional approaches to SQL-on-Hadoop.” “Every other SQL-on-Hadoop solution, whether it’s Hive or Tez or what have you, relies on a fixed schema,” Norris adds. “Whether you’re talking about MapReduce, Hive or some other SQL-on-Hadoop solution, there’s this middleman required to do the modeling, the data transformations, the plumbing to support the analysis. Drill’s ability to discover the data without having to wait for that process to take place gives you speed and agility advantages.” MapR is packaging Drill with MapR 4.0.1, also released today. The new version of its Hadoop distribution expands its real-time capabilities for use cases including operational applications, interactive queries and stream processing. The new version includes multiple batch processing frameworks, including MapReduce 1.x and 2.x (YARN-based), as well as Spark (0.9 and 1.0.2). It also supports five SQL-on-Hadoop technologies: Hive (0.11, 0.12, 0.13), Drill (0.5), SparkSQL (1.0.2), Impala (1.3.1) and certified integration with HP Vertica. It adds support for the HBase (0.94.21, 0.98.4) and MapR-DB NoSQL technologies and three machine learning and graph libraries in the form of Mahout (0.8, 0.9), MLLib (0.9, 1.0.2) and GraphX. Follow Thor on Google+ Thor Olavsrud covers IT Security, Big Data, Open Source, Microsoft Tools and Servers for CIO.com. Follow Thor on Twitter @ThorOlavsrud. Follow everything from CIO.com on Twitter @CIOonline, Facebook, Google + and LinkedIn. Related content brandpost Fireside Chat between Tata Communications and Tata Realty: 5 ways how Technology bridges the CX perception gap By Tata Communications Sep 24, 2023 9 mins Emerging Technology feature Mastercard preps for the post-quantum cybersecurity threat A cryptographically relevant quantum computer will put everyday online transactions at risk. Mastercard is preparing for such an eventuality — today. By Poornima Apte Sep 22, 2023 6 mins CIO 100 Quantum Computing Data and Information Security feature 9 famous analytics and AI disasters Insights from data and machine learning algorithms can be invaluable, but mistakes can cost you reputation, revenue, or even lives. These high-profile analytics and AI blunders illustrate what can go wrong. By Thor Olavsrud Sep 22, 2023 13 mins Technology Industry Generative AI Machine Learning feature Top 15 data management platforms available today Data management platforms (DMPs) help organizations collect and manage data from a wide array of sources — and are becoming increasingly important for customer-centric sales and marketing campaigns. By Peter Wayner Sep 22, 2023 10 mins Marketing Software Data Management Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe