In early 2000, Cereon Genomics had a serious situation on its hands: It was running out of computing power.
A genomics company based in Cambridge, Mass., Cereon combines genomics research tools with high-speed computing to discover nature’s best genes for enhancing farmers’ crops. Historically, Cereon, established in 1997 as a subsidiary of St. Louis-based agriculture products provider Monsanto, ran gene-discovery applications on its largest mainframe. But advances in genomics tools and lab processes caused Cereon’s data production to expand at a fantastic rate, until it became too much for the mainframe to handle. “We were awash in terabytes,” says Mark Trusheim, Cereon’s copresident and COO. “We have to discover our products quickly and be first to market, and our ability to understand the raw data created by all our genomics tools became a huge bottleneck in our research pipeline.”
Cereon needed more computing power?fast. So it tapped into its existing Unix server architecture, bought a bunch of new boxes and networked a grid of processors in Cambridge and St. Louis into a virtual supercomputer that company researchers could use to submit jobs from their desktops. Specialized software from Platform Computing, a Markham, Ontario, company, broke large jobs into smaller computing tasks, distributed them among the CPUs in the grid and reassembled the results into a finished product. The grid was up and running by mid-2000, and Trusheim says it’s been a huge benefit. “It’s helped us optimize the use of the hardware we have, and we see less need to add,” he says. “We’ve been saving millions of dollars of IT hardware cost over the last two years as we automatically load balance across processors and now physical data centers.”
Cereon’s solution is not completely out of left field. The idea behind grid computing (historically known as distributed computing; see “Grid Computing…Defined?” on this page) has been around for years. It simply means submitting massive jobs into a dispersed network of computing resources to harness idle processor cycles for additional computing power on demand. Until the past couple of years, however, distributed computing has been primarily the province of academia and nonprofit research. But its new form, grid computing, is starting to emerge in a commercial context as well. Early adopters, including biotech companies, pharmaceutical makers and chip manufacturers, are building their own grids to handle complex problems. And once the technology matures?and if CIOs find applicable uses?adoption of grid computing will be more widespread, both within and among enterprises. Grid proponents’ ultimate goal is a worldwide grid, similar to the electric power system, which users can access over the Internet through service providers on a pay-as-you-need basis.
How It Works
Grid computing uses networked clusters of CPUs connected over the Internet, a company intranet or a corporate WAN. The resulting network of CPUs acts as a foundation for a set of grid-enabling software tools. These tools let the grid accept a large computing job and break it down into tens, hundreds or thousands of independent tasks. The tools then search the grid for available resources, assign tasks to processors, aggregate the work and spit out one final result. Grid toolkits also contain middleware that enables a diverse, multivendor array of hardware to accept assignments and handle all the same applications.
CIOs can realize significant benefits from building a grid, or so say grid solutions providers. Ian Baird, Platform’s chief business architect and corporate grid strategist, says CIOs are under incredible pressure to increase ROI in IT because they’ve already spent an incredible amount of money, and?particularly during a recession?companies are reluctant to spend more on additional computing resources. “The grid is a way to get maximized utilization of existing resources without spending millions of dollars on hardware,” he says. For example, Baird claims that one Platform customer, a bioinformatics company, planned to spend approximately $3 million on new hardware to expand its computing resources. Instead it spent around $150,000 to install a grid, and it no longer needs to buy the new hardware.
Despite the potential benefits, grid-ready applications remain a rare bird. The technology best serves problems that are computationally intensive using algorithms that developers can break down into discrete computational units, such as genetic research where scientists must mathematically analyze thousands of genes in combination to find matches. And that’s not a task most corporations face.
Despite its seemingly limited applicability, grid computing has generated considerable buzz. A number of major hardware vendors, including Compaq, Hewlett-Packard and IBM, have announced commercial grid-computing initiatives in the past year. This is part of a big push for them to sell more hardware, according to Robert Batchelder, a research director at Gartner in Stamford, Conn. The vendors realize that when companies install their own grids, they’re taking advantage of resources they already have. But?like Cereon?they’ll probably still need to buy more hardware in the end. And with products available from companies such as Platform and Entropia as well as the government-funded Globus Project (which makes the Globus Toolkit?a free, open-source set of grid tools), it’s now easier for companies to create grids. So the big vendors aren’t about to miss out on the opportunity to sell hardware loaded with these tools. “A company will throw iron at a problem, and someone like IBM doesn’t care what you do as long as it’s IBM iron,” says Batchelder.
Grid bundles seem to be the favored approach. IBM, for example, announced in November that it will deploy the Globus Toolkit on its Linux and AIX servers. Platform announced around the same time that Compaq will package Platform’s Grid Suite software on its Tru64 Unix servers and its Linux servers. And in 2000, Sun Microsystems released Sun Grid Engine, free software that helps companies set up “cluster grids,” or grids that live in a single location. In November 2001, Sun announced the no-cost beta release of Sun Grid Engine, Enterprise Edition, which it will eventually sell to help companies build “campus grids” that link computer resources in several departments or locales.
Meanwhile, some vendors are striving for a utility model of grid computing. IBM announced in August that it plans to let companies?sometime in 2002?buy processing power instead of building their own grids. Though Dave Turek, vice president of emerging technologies for IBM, declined to describe the exact configuration of such a utility, he indicated at the time of the announcement that customers might tap into processing power at 50 IBM data centers around the globe. Meanwhile, HP is rolling out a service model called Utility Data Center, which it claims will be able to run enterprise and Web apps, such as those used by online retailers, as opposed to the more traditional grid computing tasks. Nick Gall, an analyst with the Meta Group based in Stamford, Conn., says HP’s plan is ambitious. “They’re trying to address many of the key challenges of grid computing in a very short time frame. There’s a significant chance they’re setting expectations too high,” he says.
Before grid computing moves into the commercial mainstream, CIOs need to learn more about the technology and its possibilities, and identify ways they can use it. But proponents claim that just about any sophisticated company can find a need for high-volume number crunching. For example, says Gall, companies that engage in weather forecasting, such as insurance companies and commodities futures brokerages, have supercomputing needs that grids can address. Similarly, financial services companies could use grid computing at the end of each day to make sure their portfolios are balanced with the appropriate risk. “How much would it have been worth to [defunct investment house] Barings to have been able to analyze its entire portfolio and find out what one rogue trader was doing?” says Peter Jeffcock, Sun’s group marketing manager for software products in the technical market products group, based in Palo Alto, Calif. “A grid could have helped because this is a huge, compute-intensive task.” Jeffcock adds that grid computing can help any company that does its own software development and testing. “You’re running weekly, nightly and sometimes daily regression tests. If you could run them over lunch and say, ’Here’s what you need to fix in the afternoon,’ you could deploy the products much quicker.”
But other problems need to be solved before grid computing becomes truly widespread?particularly in the context of inter-enterprise grids, utility models and ultimately a global grid. The biggest issue is security. If you’re sharing the grid with other companies?as you share the power lines now?you need assurances that nobody else can get to the confidential information you’re throwing into the system. Standardization also remains a challenge. A grid involves sharing distributed heterogeneous resources and bringing together a number of operating systems, vendor platforms and applications. Getting them all to talk will require new protocols. A number of nonprofit groups such as the Global Grid Forum, the Globus Project and the New Productivity Initiative are working on security and standardization issues. But until these issues are worked out, grid computing will likely remain an internal corporate effort.