The next generation of supercomputers could be crippled by hard drive failures every few minutes, the U.S. Department of Energy has warned, and so it is funding a Petascale Data Storage Institute to solve the problem.The Los Alamos Laboratory has commissioned RoadRunner, a 32,000 CPU supercomputer from IBM that will operate at petaflop levels—a sustained speed of 1,000 trillion calculations per second. Put another way, this is a quadrillion, a million billion, operations per second.Thousands of hard disks will be needed to keep the thousands of CPUs supplied with data. And Garth Gibson, an associate professor of computer science at Carnegie Mellon university, who will lead the new institute, has warned that this system “likely will require up to hundreds of thousands of magnetic hard disks to handle the data required to run simulations, provide checkpoint/restart fault tolerance and store the output of these modeling experiments. With such a large number of components, it is a given that some component will be failing at all times.”Current teraflop-level supercomputers, operating at trillions of operations per second, have disk failures once or twice a day, according to Gary Grider, a co-principal investigator at the Los Alamos National Laboratory. Once supercomputers are built out to the scale of multiple petaflops, he said, the failure rate could jump to once every few minutes. Storage systems for them will need to tolerate many failures, mask the effects of them, and continue to operate reliably. “It’s beyond daunting,” Grider said of the challenge facing the new institute. “Imagine failures every minute or two in your PC, and you’ll have an idea of how a high-performance computer might be crippled.” He emphasized: “For simulations of phenomena such as global weather or nuclear stockpile safety, we’re talking about running for months and months and months to get meaningful results.”-Chris Mellor, Techworld.com (London) Related Links: IBM to Build New DOE Supercomputer U.S. Supercomputer Gets Performance BoostCheck out our CIO News Alerts and Tech Informer pages for more updated news coverage. Related content brandpost Sponsored by Freshworks When your AI chatbots mess up AI ‘hallucinations’ present significant business risks, but new types of guardrails can keep them from doing serious damage By Paul Gillin Dec 08, 2023 4 mins Generative AI brandpost Sponsored by Dell New research: How IT leaders drive business benefits by accelerating device refresh strategies Security leaders have particular concerns that older devices are more vulnerable to increasingly sophisticated cyber attacks. By Laura McEwan Dec 08, 2023 3 mins Infrastructure Management case study Toyota transforms IT service desk with gen AI To help promote insourcing and quality control, Toyota Motor North America is leveraging generative AI for HR and IT service desk requests. By Thor Olavsrud Dec 08, 2023 7 mins Employee Experience Generative AI ICT Partners feature CSM certification: Costs, requirements, and all you need to know The Certified ScrumMaster (CSM) certification sets the standard for establishing Scrum theory, developing practical applications and rules, and leading teams and stakeholders through the development process. By Moira Alexander Dec 08, 2023 8 mins Certifications IT Skills Project Management Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe