Many Working Together: Massively Parallel Projects
One of the early fantasies of MPP was that it would save money by using lots of off-the-shelf chips instead of a few specialized mainframe processors. However, given the performance hit imposed by the stamp, or "communications tax," MPP vendors were in no position to accept a speed reduction by using low-performance chips. That meant they had to rewrite their software whenever a new chip came out. "We had to reprogram and reprogram and reprogram," remembers Adam Kolawa. "I remember a sequential programmer shaking his head and saying, ’Adam, you’re wasting your life.’" (Eventually Kolawa agreed. He is now chairman and CEO of Parasoft, a software testing company in Monrovia, Calif.)
Still, the vision of MPP’s smooth scalability was too attractive to give up completely, and during the next several years paths began to appear through the programming jungle. One was to move the MPP concept to a new application, from computation to storage. The steadily declining cost of storage has meant that companies now routinely work with databases measured in terabytes. Many are anticipating petabyte-level stores. Given current technology, it is difficult for a system based on sequential processing to keep track of more than a hundred terabytes (for the same reason: The processor gets saturated).
In theory, parallelized storage, as offered by companies such as San Francisco-based Scale Eight, can manage an infinite number of bytes. Further, since writing a single disk takes a thousand times longer than executing an instruction, the communications tax is relatively lower and the constraints on programming somewhat relaxed. (Josh Coates, founder, CTO and acting CEO of Scale Eight, points out, however, that managing a single disk image spanning many hundreds of terabytes raises issues almost as demanding, such as making sure that all the writes that follow from a single data entry get made at the same time.)
In the mid-1990s, database vendors such as IBM and Oracle began to introduce relational databases that ran on a small number of processors, such as four or eight. While not massively parallel in the original sense?those systems had contemplated thousands of processors, and sometimes tens of thousands?these "modestly parallel" systems presented many of the same development issues, but in a smaller and more manageable form. Hardware companies began selling systems optimized for these applications. As the software and hardware grew together, more companies began finding ways to implement parallel processing of commercial data.
Step by step, this evolution is gradually hauling the IT world into more "moderately parallelized" environments. Four- and eight-processor boards are near commodities, allowing vendors to slap together fairly large machines for low prices. Torrent and Ab Initio have managed to encode the intelligence needed to write programs for dozens of processors into graphical tools, radically lowering the level of expertise needed to create multiprocessing applications. Rod Walker, president and COO of Knightsbridge Solutions, a Chicago-based systems integration company, says Knightsbridge finds itself using the Ab Initio software in more than half its implementations. "We have seen data processing times increase by 10 times, 20 times and sometimes 30 times," he notes. Given numbers like those, perhaps this time the cycle really has begun.





