It's no secret that corporations are drowning in data. IDC estimates the volume of computer data worldwide will reach 1.2 million petabytes during 2011. \n\nA November, 2010 Gartner study found data growth was one of the top three challenges for data center managers at 47 percent of large enterprises.That usually prompts worries about the cost of data storage, but the data itself poses dangers that most companies are not managing well, according to \n\nKatey Wood, information management and e-discovery analyst at Enterprise Strategy Group. Despite the additional costs that unstructured data like e-mails, spreadsheets and word processing documents add to compliance processes, most \n\ncompanies do little to hem in sprawling data, Wood says.The annual cost of litigation (excluding settlements) was more than $1 million for half of U.S. companies in 2010, prompting far more to investigate \n\nalternate ways to pay for legal protection, and 40 percent of large companies to plan increases in spending on e-discovery during 2011, according to surveys from legal-information services firm Fulbright & Jaworski.In the "fire drill" most companies go through following a legal request for information, some companies identify all the users with pertinent data and often \n\njust copy all the data on laptops, smartphones and other devices \u2014 rather than spend time selecting what they want, Wood says. Repeat Searches Waste MoneyThat's not only expensive, but also could mean paying repeatedly for the same set of data which, itself, is so inflated with the irrelevant that only one bit \n\nin 20 is relevant to a particular case, according to John Palumbo, senior litigation support manager for law firm Foley Hoag, LLP in Boston. Foley Hoag \n\nspecializes in clients in technology, banking, pharmaceuticals and other highly regulated, data-heavy industries. Palumbo, a records-management specialist with a knack for information technology, despite a traditional gulf between IT and records managers, runs \n\nwhat amounts to an internal service bureau at Foley Hoag. Most initial internal searches net far more data than they should, plus repeat work almost certainly done for the last lawsuit, he says."Doing it the same way twice means just paying twice to collect the same non-relevant data," he says.It's unusual for Palumbo to deal with a client that isn't coming in with huge files \u2014 100GB, 200GB \u2014 and paying Foley Hoag \n\nto filter through them. "The most expensive part of e-discovery is attorney review," Palumbo says. "If you don't cull those data files down a long way you end up handing \n\nthem off to associates to go through them, and at $200, $300, $400 dollars an hour, that adds up pretty quickly. "Palumbo uses an on-premise version of software from Clearwell Systems that \n\nautomates much of the process of filtering the data. First to go are the system files, .jpeg files, audio files and others that are clearly inappropriate. Next are those of the right type but wrong date. Then he \n\ncan contact the lawyers on the case, to further identify which files are relevant. At that point the 100GB file may be down to 80GB, at no cost to the client. An outside firm would charge about $350 per GB for that service, he \n\nsays. More complex filtering, based on content, costs far more: Outside service bureaus typically charge about $1,000 per GB for full filtering and \n\nreduction, Palumbo says. Typically, Foley Hoag will do the simple level of filtering at no cost to clients, though it charges for more complex parsing and \n\nreduction.With an updated list of data custodians, Palumbo uses the Clearwell system to filter documents by time and e-mails by sender or recipient, then a \n\ncombination of e-mail domains, keywords and boolean conditional searches. Which is the most valuable function? "e-mail threading, definitely," Palumbo says. "In a decision, that could be made in 20 or 30 e-mails going back and \n\nforth, and each person on the distribution list has a copy. If you have to have a lawyer open each of those instances, the cash register will be running a long \n\ntime."E-mail threading lets Palumbo get a single chain that contains all the relevant information, and delete the rest, not to mention the rest of the mailboxes \n\nand other detritus. Typically, using careful filtering and keywords, Palumbo can get a 100GB file down closer to 5GB. Lawyers still have go manually through the final 5GB, but the bill is a lot smaller than it would be otherwise. Who Needs E-Discovery AppsOn-premise e-discovery software isn't for everyone, of course. Companies that aren't "serial litigators" or that aren't involved in pharmaceuticals, \n\ntechnology or other areas in which lawsuits over intellectual property are both common and often fatal to one of the companies involved, will have a much \n\nharder time justifying $250,000 or $500,000 for a server and full-scale e-discovery application, Wood says.E-discovery software prices are coming down as competition rises, however, and most of the leading e-discovery apps are available as cloud-based or \n\nSaaS subscriptions, Wood says."The whole idea is to know what you have and where it is," Woods says. "It may behoove you to have a tool in-house to help weed out the \n\nnon-relevant documents so you can do what you need once, instead of producing those same documents over and over again."Follow everything from CIO.com on Twitter @CIOonline.