HPCC Takes on Hadoop's Big Data Dominance
Hadoop is everywhere, but a strong competitor is making headway in the market. It's all thanks to years of production use and a solid big data pedigree that powers the billion-dollar LexisNexis database.
Tue, February 12, 2013
CIO — When you hear the expression "big data," the next word you will often hear is "Hadoop." That's because the underlying technology that has made massive amounts of data accessible is based on the open source Apache Hadoop project.
From the outside looking in, you would rightly assume then that Hadoop is big data and vice versa; that without one the other cannot be. But there is a Hadoop competitor that in many ways is more mature and enterprise-ready: High Performance Computing Cluster.
Like Hadoop, HPCC Systems is open-sourced under the Apache 2.0 license and is free to use. Both likewise leverage commodity hardware and local storage interconnected through IP networks, allowing for parallel data processing and/or querying across the architectures. This is where most of the similarities end, says Flavio Villanustre, vice president of information security and the lead for the HPCC Systems initiative within LexisNexis.
HPCC Systems Older, Wiser Than Hadoop?
HPCC Systems has been in production use for more than 12 years, though the HPCC Systems open source version has been available for only a little more than a year. Hadoop, on the other hand, was originally part of the Nutch project that Google put together to parse and analyze log files and wasn't even its own Apache project until 2006. Since that time, though, it has become the de facto standard for big data projects, far outpacing HPCC Systems's 60 or so enterprise users. Hadoop is also supported by an open source community in the millions and an entire ecosystem of start-ups springing up to take advantage of this leadership position.
That said, HPCC Systems is a more mature enterprise-ready package that uses a higher-level programming language called enterprise control language (ECL) based on C++, as opposed to Hadoop's Java. This, says Villanustre, gives HPCC Systems advantages in terms of ease of use as well as backup and recovery of production. Speed is enhanced in HPCC Systems because C++ runs natively on top of the operating system, while Java requires a Java virtual machine (JVM) to execute.
HPCC Systems also possesses more mission-critical functionality, says Boris Evelson, vice president and principal analyst for Application Development and Delivery at Forrester Research. Because it's been in use for much longer, HPCC Systems has layers—security, recovery, audit and compliance, for example—that Hadoop lacks. Lose data during a search and it's not gone forever, Evelson says. It can be recovered like a traditional data warehouse such as Teradata.