10 Hot Hadoop Startups to Watch

As data volumes grow, figuring out how to unlock value becomes vastly important. Hadoop enables the processing of large data sets in a distributed environment and has become almost synonymous with big data. Here are 10 startups with solutions for unlocking big data value.

1 2 Page 2
Page 2 of 2

Now, with emerging database solutions, features that made RDBMS so popular for so long, such as ACID compliance, transactional integrity, and standard SQL, are available on top of the cost-effective and scalable Hadoop platform. Splice Machine believes that this enables developers to get the best of both worlds in one general-purpose database platform.

Splice Machine provides all the benefits of NoSQL databases, such as auto-sharding, scalability, fault tolerance, and high availability, while retaining SQL, which is still the industry standard. Splice Machine optimizes complex queries to power real-time OLTP and OLAP applications at scale without rewriting existing SQL-based apps and BI tool integrations. By leveraging distributed computing, Splice Machine can scale from terabytes to petabytes by simply adding more commodity servers. Splice Machine is able to provide this scalability without sacrificing the SQL functionality or the ACID compliance that are cornerstones of an RDBMS.

Competitive Landscape: Competitors include Cloudera, MemSQL, NuoDB, Datastax, and VoltDB.

Key Differentiator: Splice Machine claims to have the only transactional SQL-on-Hadoop database that powers real-time big data applications.

6. DataTorrent

Hadoop, big data, Data Torrent

What They Do: Provide a real-time stream processing platform built on Hadoop.

Headquarters: Santa Clara, Calif.

CEO: Phu Hoang, who was previously a founding member of the engineering team at Yahoo, where he served as executive vice president of engineering.

Founded: 2012

Funding: The company closed an $8 million Series A round in June 2013. August Capital led the round and was joined by AME Cloud Ventures. The company previously secured $750K in seed funding from Morado Ventures and Farzad Nazem.

Why They're on This List: DataTorrent argues that we'll soon start thinking about latency issues when we think about Big Data solutions. DataTorrent points out that "data is happening now, streaming-in from various sources -- in real-time, all the time." Many organizations struggle to process, analyze, and act on this never-ending and ever-growing stream of information -- at all.

For some insights, by the time data is stored to disk, analyzed, and responded to -- it's already too late. For instance, if a hacker compromises a credit card account and manages to make a few purchase, plenty of damage has already been done, even if that account is cut off within minutes. DataTorrent contends that an organization's ability to recognize and react to events instantaneously isn't just a business advantage. In today's word, it is a necessity.

Unlike traditional batch processing that can take hours, DataTorrent claims to be able to execute hundreds of millions of data items per second. This enables organizations to process, monitor, and make decisions based on their data in real-time.

Competitive Landscape: DataTorrent's main competitors come from IBM (Infosphere Streams) and the Storm Open Source Project.

Key Differentiator: DataTorrent points to performance as a key differentiator, claiming their platform is 100-1,000 times faster than Storm.

7. Qubole

Qubole, Hadoop, big data

What They Do: Offer Big Data-as-a-Service with a "true auto-scaling Hadoop cluster."

Headquarters: Mountain View, Calif.

CEO: Ashish Thusoo, who ran Facebook's data infrastructure team before co-founding Qubole. He also co-founded Apache Hive.

Founded: 2011

Funding: The company is backed by $7 million in Series A funding from Lightspeed Ventures and Charles River Ventures.

Why They're on This List: Since Hadoop is a relatively new technology, finding someone with the expertise necessary to run and maintain it can be a tall order. By providing a managed solution, Qubole hopes to make Hadoop an easy-to-use technology.

Qubole handles the initial setup and then maintains the clusters. Qubole's auto-scaling feature automatically spins up users' clusters when a job is started and automatically scales or contracts based on workload, cutting back on costs and management requirements.

An intuitive UI expands the reach of this service beyond data analysts to entire lines of businesses. Qubole contends that some customers have more than 60 percent of their employees using Qubole.

Customers include Pinterest, MediaMath, Nextdoor and Saavn.

Competitive Landscape: Qubole will compete with Altiscale, Amazon EMR, Treasure Data, and others.

Key Differentiator: Qubole points to its proprietary technology that provides true auto-scaling and storage optimization.

8. Continuuity

Hadoop, big data, Continuuity

What They Do: Provide a Hadoop-based big data application hosting platform.

Headquarters: Palo Alto, Calif.

CEO: Jonathan Gray, who was previously an HBase software engineer at Facebook.

Founded: 2011

Funding: $12.5 million from Battery Ventures, Ignition Partners, Andreessen Horowitz, Data Collective and Amplify Partners.

Why They're on This List: Continuuity has come up with a clever way to get around the dearth of Hadoop experts: they offer an application developer platform targeted at Java developers. The lower-level infrastructure is all abstracted away by the Continuuity platform.

The company's flagship product, Reactor, is a Java-based integrated data and application framework that layers on top of Apache Hadoop, HBase, and other Hadoop ecosystem components. It surfaces capabilities of the infrastructure through simple Java and REST APIs, shielding end users from unnecessary complexity.

In late March, Continuuity released its latest service, Loom, a cluster management solution. Clusters created with Continuuity Loom utilize templates of any hardware and software stack, from simple standalone LAMP-stack servers and traditional application servers like JBoss to full Apache Hadoop clusters comprised of thousands of nodes. Clusters can be deployed across many cloud providers (Rackspace, Joyent, OpenStack) while utilizing common SCM tools (Chef and scripts).

One thing to keep an eye in is the CEO situation. Founding CEO Todd Papaioannou, who was previously vice president and chief cloud architect at Yahoo, left the company this past summer. Co-founder and previous CTO Jonathan Gray has taken over the CEO role. This is Gray's first role as a business leader.

Competitive Landscape: As of now, Continuuity is uniquely positioned. Indirect competitors come from the HaaS camp (AWS EMR, Altiscale, Infochimps, Mortar Data, etc.).

Key Differentiator: Continuuity is targeted at Java developers, which is a unique approach.

9. Xplenty

Xplenty, Hadoop, big data

What They Do: Provide HaaS.

Headquarters: Tel Aviv, Israel

CEO: Yaniv Mor, who previously managed the NSW SQL Services practice at Red Rock Consulting.

Founded: 2012

Funding: An undisclosed amount of seed funding from Magma Venture Capital.

Why They're on This List: While Hadoop is being hyped like crazy these days, it has become the de facto infrastructure technology for big data. The trouble is that the development, implementation, and maintenance of Hadoop require a very specialized skill set.

Xplenty technology provides Hadoop processing on the cloud via a coding-free design environment, so businesses can quickly and easily benefit from the opportunities offered by Big Data without having to invest in hardware, software, or highly specialized personnel.

A drag-and-drop interface eliminates the need to write complex scripts or code of any kind. With its automatic server configuration feature, users can simply point to a data source, configure the data transformation tasks, and tell the platform where to write the results to. Xplenty's platform uses SQL terminology. Thus, for data analysts, the learning curve should be minimal.

Customers include DealPly Technologies, Fiverr, Iron Source, and WalkMe.

Competitive Landscape: The main competition comes from Amazon's EMR. Other HaaS competitors include Altiscale, Mortar Data, Qubole, and recently Microsoft with Hadoop on Azure. Rackspace is about to launch its own HaaS offering based on Hortonworks' distribution.

Key Differentiator: According to Xplenty, competing services still target developers, whereas Xplenty targets the data and Business Intelligence (BI) users who do not know how to write code, but who need to move data to a big data platform.

10. Nuevora

Hadoop, big data, Nuevora

What They Do: Provide Big Data analytics applications.

Headquarters: San Ramon, Calif.

CEO: Phani Nagarjuna, who most recently served as executive vice president of products and business development for OneCommand, which provides a SaaS-based CRM and Loyalty Automation Platform for the auto retail industry.

Founded: 2011

Funding: $3 million in early funding from Fortisure Ventures.

Why They're on This List: Nuevora has set its sights on one of big data's early growth areas: marketing and customer engagement. Nuevora's nBAAP (Big Data Analytics & Apps) Platform features purpose-built analytics apps based on best-practices-driven predictive algorithms. nBAAP is based on three key big data technologies: Hadoop (data processing), R (predictive analytics), and Tableau (visualizations).

On top of all of this, Nuevora's algorithms work on disparate sources of data (transactional, social media, mobile, campaigns) to quickly identify patterns and predictors in order to tie specific goals to individual marketing tactics.

The platform includes pre-built apps for the customer marketing business process -- acquisition, retention, up-sell, cross-sell, profitability, and customer lifetime value (LTV). With only "last-mile" configurations required for individual customer situations, Nuevora's apps empower organizations to anticipate their customers' behaviors.

Competitive Landscape: When Nuevora assesses the competitive landscape, it zeroes in on big consulting firms, such as Accenture, and other predictive analytics companies, such as Alpine Data Labs.

However, since pretty much every marketing platform under the sun now includes some sort of analytics engine, I also expect them to compete with the major marketing automation providers, such as ExactTarget (which uses Pentaho for its big data analytics).

Key Differentiator: Nuevora gives end users the ability to continually recalibrate their predictions using a "closed-loop recalibration engine," which helps organizations keep up with only the most pertinent insights based on the latest data.

Jeff Vance is a freelance writer based in Santa Monica, Calif. Connect with him on Twitter @JWVance or by email at jeff@sandstormmedia.net.

Follow everything from CIO.com on Twitter @CIOonline, on Facebook, and on Google +.

Copyright © 2014 IDG Communications, Inc.

1 2 Page 2
Page 2 of 2
Survey says! Share your insights in our 19th annual State of the CIO study