Preservation is a branch of library science dedicated to the maintenance of physical artifacts. Digital preservation, its modern offspring, concerns itself with the preservation of digital artifacts such as documents, movies, audio recordings, etc. But the challenges of digital preservation are complicated by interactivity characteristic of many digital artifacts. It’s not enough to save the bits, if the goal is to understand the experience of using something in its original form. I have in mind such things as interactive fiction, video and computer games, and other similar artifacts.
Cathy Marshall and I wrote a paper a few years ago about some pragmatic strategies to consider when thinking about the preservation of hypertext fiction. We focused on interaction and attempts to preserve the reading experience. Megan Winget has been looking at the preservation of video games, including the processes used to create them. Here, even more so than for hypertext, the issues of hardware emulation become important if one wishes to re-create the experience of playing one of the old arcade games.
Our conversations about her work let me to reflect in a rather naive way on what other digital artifacts are being created these days that might be interesting to preserve so that in the future, people would have an reasonable idea of what we were up to, rather than having to resort to inferences, as described here.
So here’s a short list. I am sure others have thought of these (and many other) things before, but it seems like a place to start:
- Blogs. When thinking about preserving blogs, I think it’s useful to think about comments and conversations, in particular conversations that span blogs. A social network analysis of inter-blog linking should readily identify blog cliques that represent ongoing conversations among the authors and their readers. Preserving one without the rest will likely miss out on the conversation and evolution of ideas.
- Twitter. Twitter captures certain ephemeral aspects of our pop culture that, while not necessarily useful in their entirety, can convey a certain affect more succinctly than other media, and can make great fodder for future anthropologists. Furthermore, automated strategies such as TwapperKeeper may not work well. Consider the trace for #jcdl2009, which seems to have accrued a whole bunch of spam since the conference ended.
- Spam. Spam should be preserved for the same reasons that parasitic organisms are preserved in natural history museums. It features a rich fauna, including the following digital varieties currently enumerated in Wikipedia: Address munging, Bulk email software, Directory Harvest Attack, Joe job, DNSBL, DNSWL, Spambot, Pink contract, Keyword stuffing, Google bomb, Scraper site, Link farm, Cloaking, Doorway page, URL redirection, Spam blogs, Sping, Forum spam, Blog spam, Social networking spam, and Referrer spam, and that doesn’t include the various forms of internet fraud.
I might also add twitter spam to this list. Why isn’t that in Wikipedia yet?Anyway, the point is that while antivirus companies surely have elaborate archives of various viruses, worms, and other forms of digital malfeasance, it would instructive to catalog the end-user manifestations of these in a way that someone who is not an expert on computer security might find intelligible.
My experience with trying (and largely failing) to re-animate software created just a few years ago suggests that preservation should be thought of along side artifact creation, rather than as a last-ditch effort when the artifact in question is about to cease to exist.