by Thor Olavsrud

Jethro automates data engineering tasks for BI on Hadoop

Mar 28, 2017
AnalyticsBig DataBusiness Intelligence

By automating costly and time-consuming data engineering tasks associated with business intelligence on Hadoop, Jethro aims to accelerate BI queries.

big data spending
Credit: Thinkstock

Eli Singer, CEO of Jethro, provider of an acceleration solution for interactive business intelligence (BI) on big data, says that despite its promise, BI on big data is fundamentally broken because SQL-on-Hadoop remains too slow.

“Today’s approach to BI on big data is not working,” Singer said in a statement Tuesday. “Under the SQL-on-Hadoop hype lies monumental failure rates with existing approaches.”

Meet Jethro

Singer’s answer is Jethro 3.0, the newest version of Jethro’s enterprise solution, a SQL query engine for Hadoop that combines indexing architecture with “auto-cubes,” aggregated cubes generated from usage patterns. Data engineering tasks — like pre-aggregating tables, manually building cubes and keeping up with new and changing applications — tend to be costly and labor-intensive.

Singer says Jethro 3.0 automates such tasks by creating cubes based on actual user queries, fully indexing all table columns and managing an intelligent query result cache.

Jethro’s auto-cubes are micro-cubes generated automatically for repeatable query patterns. They’re incrementally updated with new data. Meanwhile, Jethro indexes every column automatically and appends indices with new data rather than updating them, ensuring consistent query performance. Finally, the solution uses intelligent caching to reuse the results of common queries. The company says intelligent caching is especially effective when users share dashboards.

SQL-on-Hadoop engines full-scan billions of rows of data for every query. Jethro avoids such time-consuming full scans across all the data by leveraging its indexes, cubes and cache to process queries (for BI tools like Tableau, Qlik and Microstrategy) with less effort and more speed, regardless of the query, size of the dataset or number of concurrent users.

The new version also features improvements to enterprise security features, including Lightweight Directory Access Protocol (LDAP) authentication and role-based permissions. The engine also provides the ability to directly load data from Hadoop tables and an improved management graphical user interface.

“Ever since we switched to the Jethro platform for our big data analytics needs, we’ve been able to generate consistently fast query results for our large, concurrent user base,” Samik Mukherjee, head of Engineering at Tata Communications CDN, said in a statement Tuesday. “Our data lake grows by the billions every day. Jethro takes the heavy lifting out of data re-engineering so we can focus on on other business critical applications knowing our users are generating critical intelligence and KPIs through Jethro that enhance their business decision-making and ultimately grows our customer base.”