Hard-core data preservation: The best media and methods for archiving your data

Daily backup isn’t archiving. If you want your data to survive the decades, you need to use the right tools.

Data Center

'Keeping the lights on' is no longer enough.

Credit: Google

A lot is written about the importance of backing up data, but the media and methodologies proposed aren’t generally suitable for archiving. Securing your data for posterity, i.e., archiving, requires a different approach, where shelved media life and future file compatibility trump the speed and convenience that make backup palatable to the average user.

We’ll discuss methodology later, but here’s the low-down on the media types available for backup and archival purposes.

 External hard drives

AData HV620 portable hard drive

Portable hard drives are easy to use, faster than optical, but may need their data refreshed every few years.

By far the most common backup media employed by consumers is the external hard drive. Fast compared to tape and optical, hard drives are generally reliable for the short term, and if removed from operation and safely stored, may last a decade or two before magnetic properties diminish to the point of producing unrecoverable errors. In constant use, mechanical stresses shorten a drive’s lifespan to three to five years. For the long term, hard drives on the shelf are workable, but require periodic maintenance—so they are not ideal.

That decade or two longevity figure is based on published figures for coercivity and residual magnetism for current GMR (Giant MagnetoResistance) and SMR (Shingled MagnetoResistance) recording techniques, as well as the latest platter coatings. It figures a loss of magnetic strength/signal at anywhere from 1 percent per year, to 1 percent per decade.

For non-operational drives, it’s industry practice to refresh, i.e., rewrite the data every two or three years. Consumers can do this with free software called DiskFresh

Environment is also key: Heat, vibration, humidity, and magnetic fields (strong ones are used to erase hard drives) can dramatically shorten operational or shelf life. A hard drive is also a mechanical device that’s vulnerable to shocks. You can do everything right with your drive, but drop it on a hard floor as you pull it out of the safety deposit box, and like that, you’re off to the recovery service.

Advice: If you use hard drives for archiving, use them in pairs or trios—each containing a copy of the same data. Write-protect them (see the “Methodology” section) before storing them, and rewrite the data every couple of years.

External SSDs

samsung external ssd Samsung

Samsung’s Portable SSD T1 is super fast—up to 350MBps faster than USB 3.0—but most NAND-based storage is only good for about a decade or so.

External SSDs are rugged and virtually shock-proof, but the NAND they use won’t hold data forever. The cells, which are electron traps, leak over time. The technology is also relatively new, so no one is quite sure how long an SSD will retain data when stored unpowered, but you won’t find companies touting them for long-term backup. Figure 10 years as a best case scenario, but don’t rely on it.

Advice: If you use SSDs, refresh the data on them every year or two and replace them every 10. Better yet, use something else.

Tape

fuji lto6m

LTO 6 2.5TB tape cartridge.

Magnetic tape is still in the discussion for enterprise. It’s available in very large capacities—a new Sony type can hold up to 185TB. It’s also removable media, so it’s easy to store and handle in bulk. But tape can stretch and break, as well as be erased by magnetic fields. It’s also expensive; the handling mechanisms are finicky; and because data is stored sequentially, random retrieval is quite slow. It also suffers magnetic and physical degradation over time, though the rate is greatly dependent upon the materials in use.

Advice: Consumers, don’t use tape. It’s expensive and there are easier alternatives.

Optical

m disc mdbd

Milleniatta’s M-Disc DVD and Blu-ray recordable discs are rated for 1,000 years.

If you think of optical (CD/DVD/Blu-ray) solely as a means of movie or software delivery, it probably seems antiquated. You might also dismiss garden-variety CD, DVD, and BD-R (LTH) recordable and rewritable discs as unsuitable archival media. You’d be right about that—they use inherently unstable, organic dye-based data layers.

However, there are optical discs that are unquestionably the hardiest, handiest archival media available to consumers. Write-once BD-R HTL (High To Low) can last for 100 to 150 years given a relatively mild environment—i.e., not on your dashboard in Phoenix. Milleniatta’s M-Disc BD-R and DVD+R write-once discs use an even more stable data layer that is rated for 10,000 years. Only its polycarbonate outer layers reduce that to a mere 1,000 years. Note that this is all theoretical, but the testing MOs were rigorous and performed by the government of France (BD-R), and the Navy for the Department of Defense (M-Disc DVD).

Available in 25GB, 50GB, and 100GB (currently very expensive) flavors, BD-R also has enough capacity to handle long-term backup and archival chores. The downside is a relatively slow 21MBps writing at best—substantially slower than USB 3.0 hard drives and SSDs.

If you’re worried about optical drives disappearing, know that optical retains a very strong presence in the archival community, as well as the enterprise, so that should give you some reassurance.

Advice: Despite its slow speed, optical is pretty perfect for archiving your most important data.

Online storage

cern servers 100606241 orig

CERN server farm

If I were big on blind faith, I’d just say opt for online storage and be done with it. It’s super easy, convenient, and there are some very cheap online storage services such as Amazon’s Glacier, BackBlaze, Google Drive, and OneDrive. Glacier is extremely inexpensive, at least until you need to retrieve data.

However, there are drawbacks. First off, though the means of delivery may seem magical and your data is often referred to as being safely stored “in the cloud,” in reality, it’s stored on someone else’s hard drives or other media. It’s as safe as a given service has made it.

Then there’s the ongoing cost in the form of monthly fees, and in some cases transfer charges. Also, speed and availability are limited by your online connection (DSL often has very slow upload speeds) and when your service is down, your archive is unavailable. There are also privacy and security concerns. I consider these trivial, but just FYI—the NSA had a hand in funding just about every open-source encryption project out there.

Caveats aside, having an offsite copy of your data is one of the mantras recited by backup and archival gurus. If a flood, hurricane, fire, or ex-spouse ruins your local backup, you’ve got another to fall back on.

Advice: If you use online storage, use it as a partner to local backup. That said, it’s a lot better than not archiving at all.

Active Archiving

I wouldn’t mention “active archiving” at all if it weren’t bandied about in hard drive sales literature without explanation. Active archiving has nothing to do with hard drives, per se. It’s simply the act of shuttling data between media in a storage area network or SAN with the goal of keeping the most frequently accessed data on the fastest media (RAM or SSDs) and the least frequently accessed data on slower tape or optical, with hard drives somewhere in middle.  

Methodology

There are myriad backup strategies used by pros and in enterprise environments. I’ll forego complicated strategies that average users (like myself) don’t have the time or patience to implement and stick with the basics:

1. The rule of three dictates that you always keep three copies of your data: a working copy, a backup, and a backup of your backup—preferably in another location, or off-site as it’s known in the biz.

Internet archive fire

The reason you need an off-site backup: This fire was at the scanning facility of the Internet Archive.

In archiving, you may not work with the aforementioned first copy, but stick with the rule of three (or more) anyway.

2. Don’t bother with trivial or unfinished data. Archive only irreplaceable data that’s in its final state: legal or financial documents, important memorabilia, your creative efforts, etc. If you can download it again, reinstall it, or if you are still working on it, don’t bother—you’ll just waste time and space. Let your everyday backup take care of it. Also take the opportunity to de-duplicate and prune your data before you archive.

3. Use write-once media, or write-protect your rewritable media to mitigate the chance of accidental overwrites. You can write-protect hard drives using Windows Diskpart utility and the command “att vol set readonly” after selecting the proper drive and partition. Replace “set” with “clear” to make it writeable again.

4. Don’t use the proprietary file containers (a large file containing smaller files) that many backup programs create, or compression if you can help it. Use a file system or format that you know will be readable in the future, and store the data as plain files. FAT, NTFS, HFS, EXT, ISO 9660, UDF, etc., or any of their variants, should be readable for some time. If you must use compression, make sure it’s something universal such as ZIP.

5. Stay away from proprietary file formats if possible. Use PDF/A, RTF, JPG, MPEG, etc., which are likely to be readable well into the future. It’s rare to find a program any more that doesn’t export data in some standard file format, but if you use one, archive the installation files for said program. If it’s dependent upon a specific operating system or version of such, create a virtual hard drive or virtual machine with the software installed on the operating system and archive the whole deal. You can save the installation files for the operating system, but drivers may become an issue as hardware advances.

6. Don’t use encryption except for truly sensitive data. Passwords can be lost or forgotten. Remember we’re talking long haul here.

7. Date and document the archive. Name the media as verbosely and specifically as you can. Use a piece of masking tape and a Sharpie if you have nothing else. There’s no point in archiving the same data again at the next juncture.

irs archive photo National Archives

An IBM data center back in the day. Tape drives, data input technicians, IT types.

8. Respect changing technology. Just because the media lasts 100 years, doesn’t mean the technology used to read it will. We’re only 50 years or so removed from punch cards, so the pace of obsolescence is rapid, but hard to predict. If you see the technology you used being replaced wholesale by another, re-archive. That said, you can still find the means to read ATA hard drives and they’ve been around a good 30 years. If you look hard enough, I’m sure you can even find a means to read punch cards, even if it’s in the computer museum in Mountainview, CA.

Do it!

big red button 100263181 large.idge

All the plans in the world don’t mean a thing if you don’t implement them.

You can do as much or as little as you desire in terms of archiving, and to be perfectly honest, a single archive copy will probably see you through. However, do you really want to have that “probably” on your mind?

For those who skipped the entire background discussion and would just like some quick and dirty advice…

Method one: Using a Blu-ray burner that supports BD XL and M-Disc, back up to BD-R HTL or M-Disc BD-R/DVD+R discs. Stay away from regular CD/DVD recordable/rewritable and BD-R LTH, aka, phase-change media. Make two or three copies and send at least one to a family member you trust, or store it in a safe deposit box somewhere. If you add data relatively slowly, add to your existing archive using M-Disc DVDs.

Method two: Grab two or three USB 3.0 external hard drives, back up your important data to them, write-protect them, then store one locally and put one in a safe deposit box, or send it to a trustworthy relative. Every couple of years, refresh the archive, i.e., copy it off, then copy it back again. Or use the above mentioned DiskFresh.

Augment one or both of the methods above with online storage.

Do all that at least once a year (refreshing the hard drives every couple of years), upon completion of a new project, etc. Personally, I archive immediately after I’ve done my taxes. It’s that important, and unfortunately, about the same amount of fun. You know what’s less fun?

Losing your data.

This story, "Hard-core data preservation: The best media and methods for archiving your data" was originally published by PCWorld.

To comment on this article and other CIO content, visit us on Facebook, LinkedIn or Twitter.
Download the CIO October 2016 Digital Magazine
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.