by Esther Schindler

Everlasting Internet or Dissolving History?

Nov 05, 20073 mins
Data Center

During the Web’s hype-filled early days, the voices of Internet proponents rang with promise. The Internet would make e-learning possible worldwide, human collective wisdom would create a vast storehouse of never-disappearing knowledge and we would end up with a huge knowledgebase with information, surely, at our fingertips. In my non-cynical moments I admit that some of these goals have been achieved… but I’m not very confident about that “never-disappearing” bit.

Initially, I thought my concern was a bit of “inside baseball.” We journalists, especially those of us who ever paid for our kitty’s kibble as freelance writers, are especially cognizant of the importance of our “clips” (samples of our published work shown to prospective employers). Many of those articles are published only online. So I was personally miffed (flummoxxed, even, not to mention hot-and-bothered) when, after two magazines at which I worked closed down, the publishers decided to dump the content of publications’ sites. Want to read my brilliant essays about women in computing, my stunning predictions about e-commerce (circa 1998) or my comparitive review of online forum software? Sorry. They’re gone.

But it may not be only professional authors who need to be concerned about the brave new world in which rm * can wipe out the only copy of past creations. I was reminded of this after a mostly-in-passing comment, in Managing Humans: Biting and Humorous Tales of a Software Engineering Manager (reviewed here) suggested developers and managers google for code they’ve written just to see how long it’s been around. My own code predates the web (good thing, as I’m sure I’d be embarrassed by much of it) but my spouse is a professional developer and open-source committer… and I was surprised that I didn’t find much of his stuff. I suppose nobody cares anymore about OpenDoc and early speech recognition technologies (sniff!).

Then I read an essay by Robin Luckey, The World’s Oldest Source Code Repositories on, a site that tracks open source projects and people. Robin wrote:

Most software doesn’t survive very long. The hard truth is that more than 80% of the open source software being written today will be forgotten in a few years. … All of which means that most source control repositories are lucky to survive more than a couple of years.

In case you’re wondering: the three oldest source code repositories, of the 14,000 repositories Ohloh tracks, are GCC, the GNU Complier Collection (started in November 1988); GNU Emacs (April 1985); and BRL-CAD (April 1983).

Code may not die. It may become irrelevant, to a greater or lesser degree (I doubt there are many who need to review Z80 application code) but optimization methods, performance strategies, old-fashioned algorithms remain useful more-or-less familiar. Indexing and developing search strategies become greater challenges, of course, but there’s still value in them thar code hills.

Similarly, most of the articles I wrote ten years ago are surely as useless as archives of “me and my dog” websites. But the stuff in the remainder — not the least of which is a sanity check of “what we’ll do ten years from now” written ten years ago — continues to have value.

What do we keep? What do we throw away? That applies to the data on your own intranet (“why keep the stuff from that old project?”) as much as to the whole of the Internet. I’m a packrat in the Real Universe so it’s no surprise I believe, “Keep everything!” But what about you?