by Thor Olavsrud

MapR shows off enterprise-grade Spark distribution

Jun 07, 2016
AnalyticsBig DataOpen Source

With companies increasingly turning to Apache Spark to build their data pipelines and analytical applications, MapR Technologies has released an Apache Spark distribution that packages the complete Spark stack, patented features from the MapR Platform and key open source projects that complement Spark.

At Spark Summit in San Francisco, Calif., yesterday, Hadoop distribution vendor MapR Technologies announced a new enterprise-grade Apache Spark distribution.

The new distribution, available now in both MapR Converged Community Edition and MapR Converged Enterprise Edition, includes the complete Spark stack, patented features from MapR and key open source projects that complement Spark.

[ Related: MapR delivers support for containers, security ]

“We’ve built this new distribution to make it easier for customers that leverage the power of Spark for their big data initiatives,” Anoop Dawar, vice president, Product Management, MapR Technologies, said in a statement yesterday. “We’ve seen significant growth of customers deploying Spark as their primary compute engine. We believe this gives our customers a converged compute and storage engine for batch, analytics and real-time processing that helps build and deploy applications rapidly.”

Spark catching fire

“ESG research shows Apache Spark adoption is poised to grow quickly, with 16 percent of businesses already in production and another 47 percent very interested in implementing Spark,” Nik Rouda, senior analyst with Enterprise Strategy Group, added in a statement Monday. “As such, Spark will power the next wave of big data. Yet enterprises will demand a robust platform to meet their operational requirements. MapR is helping to accelerate Spark by addressing this need.”

[ Related: MapR adds in-Hadoop document database ]

MapR says the new distribution enables all advanced analytics, including batch processing, machine learning, procedural SQL and graph computation. The integration with the MapR Platform gives it access to the company’s patented enterprise-grade features including web-scale storage, high availability, mirroring, snapshots, support for NFS, integrated security, global namespace and more. MapR says this native integration makes it a reliable and production-ready platform for Spark workloads both on-premise and in the cloud.

The distribution includes the latest version of Spark, supporting in-memory processing for big data, and enabling faster application development while allowing for code reuse across batch, interactive and streaming applications. MapR plans to leverage the distribution in its Quick Start Solution offerings, which include pre-built templates, configuration and installation. It will support the most popular use cases for Spark, including building data pipelines and developing analytical applications leveraging machine learning.

Bringing the power of Spark to the enterprise

The company says future product extensions of the distribution may include real-time streaming and operational analytic capabilities, with MapR-Streams, MapR-DB and Hadoop as add-ons.

“This is a great example of MapR’s continued commitment to open source Apache Spark,” John Tripier, senior director of Business Development at Databricks (founded by the creators of Apache Spark), said in a statement yesterday. “MapR was early to recognize the impact Spark would have on the big data landscape, and we are excited to see them extending the power of Spark for their enterprise customers with this announcement.”