What were we thinking?


Preservation is a branch of library science dedicated to the maintenance of physical artifacts. Digital preservation, its modern offspring, concerns itself with the preservation of digital artifacts such as documents, movies, audio recordings, etc. But the challenges of digital preservation are complicated by interactivity characteristic of many digital artifacts. It’s not enough to save the bits, if the goal is to understand the experience of using something in its original form. I have in mind such things as interactive fiction, video and computer games, and other similar artifacts.

Cathy Marshall and I wrote a paper a few years ago about some pragmatic strategies to consider when thinking about the preservation of hypertext fiction. We focused on interaction and attempts to preserve the reading experience. Megan Winget has been looking at the preservation of video games, including the processes used to create them. Here, even more so than for hypertext, the issues of hardware emulation become important if one wishes to re-create the experience of playing one of the old arcade games.

Our conversations about her work let me to reflect in a rather naive way on what other digital artifacts are being created these days that might be interesting to preserve so that in the future, people would have an reasonable idea of what we were up to, rather than having to resort to inferences, as described here.

So here’s a short list. I am sure others have thought of these (and many other) things before, but it seems like a place to start:

  • Blogs. When thinking about preserving blogs, I think it’s useful to think about comments and conversations, in particular conversations that span blogs. A social network analysis of inter-blog linking should readily identify blog cliques that represent ongoing conversations among the authors and their readers. Preserving one without the rest will likely miss out on the conversation and evolution of ideas.
  • Twitter. Twitter captures certain ephemeral aspects of our pop culture that, while not necessarily useful in their entirety, can convey a certain affect more succinctly than other media, and can make great fodder for future anthropologists. Furthermore, automated strategies such as TwapperKeeper may not work well. Consider the trace for #jcdl2009, which seems to have accrued a whole bunch of spam since the conference ended.
  • Spam. Spam should be preserved for the same reasons that parasitic organisms are preserved in natural history museums. It features a rich fauna, including the following digital varieties currently enumerated in Wikipedia: Address munging, Bulk email software, Directory Harvest Attack, Joe job, DNSBL, DNSWL, Spambot, Pink contract, Keyword stuffing, Google bomb, Scraper site, Link farm, Cloaking, Doorway page, URL redirection, Spam blogs, Sping, Forum spam, Blog spam, Social networking spam, and Referrer spam, and that doesn’t include the various forms of internet fraud.
    I might also add twitter spam to this list. Why isn’t that in Wikipedia yet?Anyway, the point is that while antivirus companies surely have elaborate archives of various viruses, worms, and other forms of digital malfeasance, it would instructive to catalog the end-user manifestations of these in a way that someone who is not an expert on computer security might find intelligible.

My experience with trying (and largely failing) to re-animate software created just a few years ago suggests that preservation should be thought of along side artifact creation, rather than as a last-ditch effort when the artifact in question is about to cease to exist.


  1. Maureen Pennock says:

    Hi Gene,

    On preservation of blogs – we’re tackling this in a new project called ArchivePress – more info at http://archivepress.ulcc.ac.uk/
    Thanks for the link to your 2004 paper too, I’ll check that out shortly…!


  2. Maureen, your project sounds interesting. Are you focusing on just archiving, or on making sense of the relationships among the blogs and comments? Also, are you interested in preserving the look and feel of the blogs, or are you just using WordPress as a container? I ask because some sites have custom plugins that can affect the reader’s navigational experience.

  3. Hi Gene,

    Thanks for talking about my work! So exciting. The videogame project is going along smoothly, and we have some tangential projects that I think you might find interesting, and which are not on the website. The first is specifically related to what you’re talking about in this post – because preservation of the games themselves is so fraught both from a technical and legal standpoint, we’re exploring the use of game surrogates to represent (or augment a representation) of games within a collection.

    Our basic research question is: how well do different types of game surrogates represent the actual game? The process:

    – created a taxonomy of surrogates which at its highest level includes surrogates made by players, by industry, and by cultural critics (video, image, text surrogates) (I’ll publish something soon about this taxonomy);
    – collected surrogates that matched the taxonomy about one specific game – World of Warcraft – (that process is also something which I’ll talk about somewhere – maybe I should put this on the project blog?);
    – created questions about games which were informed by current research;
    – got a bunch of people who said they had “no experience” with WoW watch or interact the different kinds of surrogates
    – Those people then answered questions and did some visual gisting exercises.
    – We also had a set of people who spent an hour (approximately the length of time it took the other group to interact with the surrogates) playing WoW and answering the same set of questions
    – One more group of people who interacted with nothing (used their pre-existing knowledge) and just answered the questions.
    – Finally, we talked to WoW players to get a “ground truth” for the answers.

    We’re just now finishing with content analysis of the interviews. But preliminary findings indicate that, for those players who had very little prior experience with WoW, interacting with the surrogates gave a better understanding of the game than did playing the game for the same amount of time. Even the people who interacted with nothing displayed more knowledge about the game than did those who played the game. Playing the game seemed to confuse people. We also found that even for those people who said they had “zero experience” with WoW, they actually knew quite a lot about the game, which means to me that the genre of MMOs is becoming so ingrained in the culture that we might not need to worry too too much about describing that element for preservation.

    The other tangential project is less related to this blog post – but I might as well mention it –

    Because there’s no good model for preserving and providing access to something as complex as videogames, we want to look at other, related archives to see what sorts of materials they collect, what their preservation mandate is, and what kinds of services they provide. We settled on a double front:

    1) examine Film Archives as the exemplar (we chose film archives because of the technology obsolescence and fragile materials similarities, as well as the cultural significance and early – negative -reception to collecting of film materials – they’re both popular culture, so there’s a hesitance to take the materials seriously; also similar because the preservation is dependent primarily on an industry that likes to keep secrets), and

    2) doing a case study on the archival materials related to the Marx Brothers – because they worked in radio, television, film, and stage – both vaudeville and Broadway – we want to see what kinds of materials are available for an entity that spans genres and formats (like videogames).

    Any thoughts on either of these projects?


  4. It is interesting that people who played the game (without prior experience with it) got less out of it than people who examined materials about the game. The old psychological theory of Semantic Elaboration would explain this because people learning the game were paying attention to low-level mechanics of interaction, rather than to higher-level aspects of the experience.

    So the lesson for archival and curation returns to the question of who the target audiences are for the archived materials. While it may be interesting to some in the future to play the games as they were played in the past, for many others it might be better to have a more coherent narrative about the media than the thing itself.

    Your question about film archives then is interesting because there the result may be different: will more people want to watch original old movies than play original old games? Probably so, because while movie technology has changed tremendously, people’s interaction with them is still pretty much the same. Whereas games separate the doing from the narrative (although there are narratives within games as well), with film the separation is much less pronounced. (Although some historical background may help people understand older movies better.)

  5. Maureen says:


    We are explicitly preserving the content: we are developing a lightweight solution that can preserve the information content, for the institutional record. We believe this is an important record-keeping activity overlooked by organisations, and want to provide an easy way to address it at the institutional level. Richer solutions will no doubt emerge, but we believe that the content is the sine qua non of blogs.

    Our primary focus is on archiving and we have mixed views about chasing relationships. It is context, but to what extent is it the archive’s responsibility to preserve it explicitly? (Shouldn’t we leave something for the researcher to do?!) Cf. the correspondence problem.

    It is true that social networking continues to evolve features to make preserving rich networks of data and metadata more viable: as this becomes readily available in the semantic metadata, attempts should be made to preserve it.

    But aren’t we always faced with deciding where scope ends (in a world of limited resources)?


  6. Maureen, no doubt the issue of where to leave off is a tricky one. Perhaps one way to decide on when to stop is to consider who the users of your archive are (will be?) and what their information needs are (will be). For example, if the goal is to understand the origins of opinions and and of ideas, I could imagine doing a network analysis on blog linkage and comment references to understand which groups of blogs “go together.” Blogs that show a strong affinity to each other through bi-directional linking and cross-commenting may need to be archived together; blogs that are hubs may want to be preserved independently.

    Is this old news?

Comments are closed.