5 Ways to Prepare for Big Data With Scale-Out NAS
Excited by the prospect of turning your company's unstructured data into actionable business intelligence? Your first step is to create a storage architecture that can deal with petabytes of data. EMC Isilon's Nick Kirsch says scale-out NAS is the best solution, and he has five tenets he suggests CIOs use to judge it.
Mon, March 12, 2012
CIO — As enterprises seek to move into the big data world--digitizing paper documents and saving email communications, Word docs, Excel files and all sorts of other unstructured data with the hopes of mining them for actionable business intelligence--they need to address a big problem up front: storage.
"Enterprises have suddenly accumulated petabytes of information," says Nick Kirsch, director of product management for EMC Isilon. "They're faced with a similar challenge: They've got all this information, how do they make use of it and how do they store it in a scalable architecture?"
One possibility is to scale vertically (scale up). The idea is to make your existing storage nodes larger, faster and/or more powerful by replacing your existing storage devices with new, higher-capacity devices. Consolidating storage infrastructure in such a way is attractive, since it simplifies management and reduces the amount of floor space and power consumed. But it's not without problems: It can't span multiple locations easily, it doesn't have much inherent overall resiliency and large, high-performance storage devices can get expensive in a hurry. And when dealing with the ever-increasing flood of information, the biggest problem is that today's storage devices can get only so big.
"You can build a bigger and bigger single unit controller," says Kirsch. "But at some point you can't build that system any bigger; you have to add a second system. You could end up with hundreds of separate units you need to manage."
Instead, Kirsch says scaling horizontally (scale out) with NAS is the way to go. A scale-out NAS architecture forgoes expensive, high-capacity storage devices for commodity storage components combined into an aggregate storage pool. Instead of making nodes bigger, you add nodes as necessary. The downside is that you can very quickly wind up with a much more complex management environment. But it can span multiple locations and it has a great deal of inherent resiliency. And, perhaps most important from the perspective of managing big data, you can add storage rapidly and cheaply.
"I think the biggest thing that we see, the biggest complaint when it comes to storage is that it's really easy to manage a single unit, but when you have two or more units it becomes complicated," Kirsch says.
For big data, NAS is preferable to SAN, Kirsch says, because SAN is not built for unstructured data and file sharing. In order to use SAN with network protocols like NFS or CIFs/SMB, you would have to deploy file servers in front of the SAN, resulting in additional management complexity and affecting scalability.