How open source makes the big data ecosystem richer

greg stackiq
Credit: Swapnil Bhartiya

"The more big systems use Stacki, the more feedback we get from them, which makes the entire ecosystem richer. It benefits the entire community." says StackIQ co-founder Greg Bruno.

StackIQ is a California based company that offers a server automation platform for clustered, scale-out IT infrastructure. I met up with Greg Bruno, VP of engineering and co-founder of StackIQ, to learn more about the company, its product and the Stacki open source project. Here is an edited version of that interview.

What is StackIQ?

StackIQ is a company that was founded on doing full stack automation. Last year, in June, we open sourced the bottom layer of the stack. We call that project Stacki.

What does Stacki do?

Stacki's job is to get machines, either physical machines or virtual machines, which act like physical machines, up to a ping and a prompt. We divide the stack up into three layers.

Level 1: That’s where Stacki lives. It gets the base operating system up and running, configure the network, disc drives, SSH configured.

Then we looked what is really a perfect hand off to the next layer, which would be the DevOps layer. We don't assume any sort of DevOps tools, so you can go in and you can install Salt, Ansible, Chef, Puppet...whatever you prefer. You can do that by hand, after you bring up all your nodes to a ping and a prompt. Or you can extend the Stacki framework in order to automate whatever DevOps layer you want. From there, you can also automate the installation of the application, which rides on top of the DevOps layer.

Who are your customers?

One of our big lighthouse Stacki customers is PayPal. PayPal uses Stacki to get the nodes up to a ping and a prompt, and then they extended the framework to bring down Ansible. They also extended the framework to pull the Ansible playbooks, in order to automatically install Hortonworks, and get Hortonworks up and running. They currently have 3000 nodes in production using Stacki.

You told me there was an interesting story behind PayPal, can you share that?

When PayPal and eBay divested, the tools to manage Hortonworks nodes stayed with eBay. PayPal was left with over 3000 Hadoop nodes with nothing to manage them They had to come up with a solution to provision them. So when we learned about their problem we said ‘we think we can help.’ Their team was cut in half so they brought us in and we did a small proof of concept and it worked.

In this Paypal case, this is strictly bare metal. There's no hypervisors involved, there's no virtualization. It's all physical gear.

How can Stacki be used with OpenStack?

We have some paying customers that are paying for that full stack automation. We use Stacki to get the nodes up to a ping and a prompt, and then we extend our own framework in order to bring OpenStack up and running. We fully configure OpenStack, and from there, you use the OpenStack dashboards in order to manage all the virtual machines inside of OpenStack. But we're going to handle all the complex networking that OpenStack requires, handle all the physical disc drives, and then just do that hand off to OpenStack.

Looking at Cloud Foundry and OpenStack, do you think that there is some kind of overlap there?

Yeah. There is overlap, but really, at the end of the day, you have to install physical nodes. Obviously I'm jaded, but we think we're the best solution to get you to cloud. In order to do that bare metal installation, whether you're going to ride a virtualization app on top of it, or if you just want to get full performance out of the node on Hadoop, or NoSQL, or if you're a high performance computing shop, you want to get every ounce of performance out of it. You want to be on bare metal.

Stacki, as you said, is open source. Is there any proprietary component in your solution?

It's proprietary in the way in which we configure. A customer will come to us and say, "This is the application that we want automated." We say, okay, whatever that application is, the only requirement is that it runs on CentOS or Red Hat. If it does that, we look at the configuration guide that comes with that software, and say, "Alright, we're now going to extend the Stacki framework in order to automate the configuration and installation of whatever that vertical is that they want."

How do you decide what to open source and what to keep as secret sauce?

We talk about that all the time. We had a lot of design and architecture meetings to figure out, where do we draw the line for Stacki? Once we landed on the phrase, ‘a ping and a prompt,’ then it became really easy.

Actually, we had a couple arguments because we do some really neat things with automating the configuration of hardware. Things like parallel formatting of disc drives really comes into play when you're building big data nodes, because there you have 12 or 24 disc drives, a terabyte, two terabytes, three terabytes each. If you're not doing a parallel format, you're going to wait hours just for the file systems to be all set up.

We've augmented the Red Hat installer to do parallel formatting of the disc drives. We have extended Red Hat's installer in order to incorporate the tools, in order to configure LSI controllers, or HP Smart Array controllers. The reason why we did that is, again, we wanted to hand off a system that we felt was a relevant system: if I'm building a big data system, I want all my disc drives formatted. So we're like, okay, we should add that into Stacki, because those are exactly the community members we want to have as part of the Stacki project. Even if we don’t get a dime back. The big data guys build big systems. The more big systems use Stacki, the more feedback we get from them, which makes the entire ecosystem richer. It benefits the entire community.

We understand open source. We've built a community before. We were with the University of California at San Diego. Somewhere from 2000 to 2010, we were helping people build high performance computing systems. We built a large, robust community that way. We really understand what it means, the power and the value of a large community. Whether the guys are building two nodes, or the guys are building 2000 nodes. They all matter, and they all provide something to the community.

What are the risks of vendor lock-in when somebody chooses to use Stacki?

Stacki is open source. The bottom layer is open source. If they don't like was free. There was no capital spent for it. If they ever want to tweak it, the code is available. Or, if they just want to fork it, replace it with another bare metal installer, they can do that as well, which is cool.

This article is published as part of the IDG Contributor Network. Want to Join?

Drexel and announce Analytics 50 award winners
View Comments
Join the discussion
Be the first to comment on this article. Our Commenting Policies