Review: Databricks makes big data dreams come true

Become An Insider

Sign up now and get FREE access to hundreds of Insider articles, guides, reviews, interviews, blogs, and other premium content. Learn more.

Cloud-based Spark machine learning and analytics platform is an excellent, full-featured product for data scientists

Editor's Choice

For those of you just tuning in, Spark, an open source cluster computing framework, was originally developed by Matei Zaharia at U.C. Berkeley's AMPLab in 2009, and later open-sourced and donated to the Apache Foundation. Part of the motivation for creating Spark is that MapReduce only allows a single pass through the data, while machine learning (ML) and graphing algorithms generally need to perform multiple passes.

Spark is billed as a “fast and general engine for large-scale data processing,” with a tagline of “Lightning-fast cluster computing.” In the world of big data, Spark has been attracting attention and investment because it provides a powerful in-memory data-processing component within Hadoop that deals with both real-time and batch events. In addition to Databricks, Spark has been embraced by the likes of IBM, Microsoft, Amazon, Huawei, and Yahoo.

To continue reading this article register now

Join the discussion
Be the first to comment on this article. Our Commenting Policies