Will Fault Tolerant Servers Improve Your Virtualized Environment?

Companies are turning to fault tolerant servers as a way to improve uptime without experiencing the downtime associated with high availability solutions. But fault tolerant technology may not be right for every enterprise.

Enterprises are taking a variety of approaches to keep applications online as virtualization puts more apps on fewer servers. While experts in the virtualization field say many firms use high availability software associated with the widely used VMware hypervisor, fault tolerant servers—a throwback to the 1980s—also vie for a role in heavily virtualized environments. This is in large part because they promise an uptime improvement over high availability solutions.

Stratus Technologies, known for such fault tolerant gear as the IBM-branded System/88 decades ago, is positioning its current crop of products to support virtualization. NEC Corp. of America, meanwhile, targets virtualized settings as one market for its fault tolerant server line. Customers have a few choices.

VMware FT Picks Up Immediately But Requires More Hardware

One option is VMware vSphere Fault Tolerance. While VMware HA involves rebooting virtual machines (and the associated downtime), FT works around that problem by letting customers run a "shadow instance" of a production virtual machine that's maintained in lockstep with main instance.

The shadow instance takes over if something happens to the production virtual machine. "The second one picks up immediately without having to restart," says Milton Lin, master cloud specialist at Force 3, a Crofton, Md.-based systems integrator, Lin says.

However, Lin adds, doing so does consume additional resources. And Kris Lamberth, chief technology officer at Paranet Solutions, a Dallas-based CRM and IT outsourcing company that offers virtualization services, notes that FT is effectively a mirroring method, so enabling the technology involves doubling up on the hardware.

As for limitations, FT supports only one CPU at this time. "For multi-threaded applications—those that take advantage of multiple CPUs/cores—you are potentially limiting the performance or undersizing the CPU cores using fault tolerance, Lin explains.

Such considerations compel customers to carefully consider where they deploy FT, according to Lin. "[It's] not something a customer would implement across the board."

Meanwhile, Fault Tolerant Servers Work Well With Virtualization

Another take on fault tolerance comes from hardware vendors, who position their specialized servers as an alternative to clustering.

Stratus ftServer systems, equipped with redundant CPU/memory units, target Microsoft Windows Server, Red Hat Enterprise Linux and virtualization workloads, according to the company. Dessau said Stratus technology aims to prevent downtime and provides a simpler solution than clustering.

The challenge of clustering, he says, is that solutions have many parts—"and the more parts you have, the more likely you are to have something break." Stratus also offers software that provides platform availability for applications deployed on industry standard servers, Dessau notes.

Pinellas County Utilities in Clearwater, Fla. uses Stratus fault tolerant servers to support its mission of providing safe drinking water and wastewater treatment. The water utility's supervisory control and data acquisition (SCADA) system runs in a virtualized environment based on VMware. Plant operators use thin client devices.

Mike Skrzypek, SCADA system and security manager at Pinellas County Utilities, says fault tolerant servers meet the county-owned utility's objectives. "The technology is by far the most forgiving you have out there. The system is automatically backed up—[if] something goes down, you won't lose any information," he says.

Related: Securing SCADA Systems Still a Piecemeal Affair

The servers and related services offer levels of protection. Each server records and stores information on two sets of drives, for example. Skrzypek says the county operation has a service contract with Stratus that provides 24/7 server monitoring.

NEC, meanwhile, offers a similar message of availability coupled with simplicity of operations. Steve Gilman, Express5800 product manager at NEC, says the company's previous fault tolerant servers, with up to 96 GB of memory, were geared toward single applications. But the company's Express5800 fault tolerant servers have a memory footprint of up to 256 G, which Gilman said positions the server for use with virtualization.

"We can broaden that market a little bit, depending on the size of the customer," Gilman says.

At Latisys, Teeft hasn't encountered requests for fault tolerant boxes. He says HP's Converged Infrastructure lets customers scale across multiple blades for redundancy, and the customers he has spoken with are comfortable with the converged infrastructure for availability, performance and scale.

"We haven't had any customers specifically ask for a Tandem-esque kind of compute capacity today," Teeft says, referring to the fault tolerant server line that Compaq acquired back in 1997.

For SMBs, Price of Shared Storage May Be Right

Software and servers often spring to mind in availability discussions. However, Andrew Judge, president of Grove Networks, a Miami company that offers virtualization consulting among other services, notes that shared storage shouldn't be overlooked.

Judge cites storage as the toughest resource to deal with when it comes to availability in virtualized environments. A small or medium sized business (SMB) administrator's lack of knowledge and experience is one factor, he says, noting that virtualization often marks the first time an SMB has been required to have a storage-area network (SAN).

Cost also ranks as a critical consideration. Synchronous replication between two SANs lets organizations cluster storage to achieve high availability. But that benefit comes at the cost of redundant storage—and, Judge says, "That gets very expensive."

Judge's observation sums up the essence of availability and the trade-offs customers face: The price tag typically increases with the degree of uptime assurance. "Getting from four nines to five nines is exponentially more money," he says. "Just match your budget to how many 9s you want and be realistic about it."

Recommended
Join the discussion
Be the first to comment on this article. Our Commenting Policies