Don't go there


The field of information retrieval is inherently (some might say pathologically) data-driven. We need datasets to test algorithms, to compare systems, etc. This is all good. It’s particularly good to have data that are meaningful and relevant, because it makes it easier to motivate users and to generalize findings to data that people care about.

I expect that in the next few cycles of conference submissions, we will see a number of papers analyze the “cable” data leaked by Bradley Manning to Wikileaks. It’s a large enough dataset with topical relevance that is sure to attract all sorts of analyses, much like the Enron email dataset did in 2004.

But there are some important differences.

Enron executives were charged with a number of crimes, including bank and securities fraud. The data was collected through a discovery process, and was used by prosecutors to build a case against the defendants.

The Wikileaks “cable” data consists of messages sent to the US State Department by ambassadors and other employees during the course of their regular, legal, duties. The data was stolen and then made public to expose alleged misdeeds by the US government in its implementation of its foreign policy.

Presumably the reason for exposing these documents is to affect the foreign policy of the US. Whereas a case for publicity could be made for the earlier leaks of military documents from Afghanistan and Iraq by people who disagreed with the conduct of those wars, no such logic applies to the documents in question. Here we find disclosures of diplomatic channels at work, of people trying to achieve mutual understanding rather than waging war. The release of these documents has not only harmed some of the people who were fostering that peaceful communication, but also makes it less likely that others will engage with diplomats in future.

If we believe that diplomacy is better than war at resolving many issues of foreign relations, thwarting that exercise by stealing and publishing secret documents is not only illegal but also immoral and stupid.

It seems to me, therefore, that using this ill-gotten information for one’s research condones and legitimizes such behavior. Thus I encourage other researchers to reject the temptation to analyze this data, and program committees to discourage submissions based on that data.

We as a research community should not to stoop to trafficking in stolen property. Let’s not sully ourselves with it.

Share on: 


  1. It is deplorable that so much sensitive information was so poorly safeguarded that it became public. It is deplorable that some clerk took it upon himself to set it loose, and it is debatable whether the major publications that have been doing most of the dissemination are doing the public any favor.

    The origins of the data, however, do not turn its use for research into trafficking in stolen property any more than pretending that it hasn’t already been acquired by every state security agency in the world by now keeps it secret. The damage has been done and the value of the property, which was its secrecy, has been destroyed. Nothing that researchers do, or refrain from doing, is going to change that.

  2. While you’re absolutely right that the information has been set loose and that this cannot be undone, using this information legitimizes the effort to steal and disseminate it. I don’t think this is a good idea.

  3. I admire your courage for taking this position, but I think you’re wrong. State secrets, including diplomatic secrets, are anthetical to democracy. History has repeatedly shown, and recent events as much as any in history, that governments can’t be trusted. Diplomats are public servants and should report to the public. Wikileaks is advancing democracy and open government. Think of how much more we ordinary citizens have learnt about how the world works than we knew two weeks ago. Saudis asking the US to bomb Iran? The Italian Prime Minister receiving kick-backs from the Russian Government? An Australian government minister acting as an informant to the US embassy? Governments are being hypocritical and, frankly, anti-democratic in trying to shut wikileaks down.

  4. Keeping your information secret and trying to discover secrets of other countries is the sine qua non of diplomacy. Likewise, off-the-record conversations are essential to reaching an understanding. Exposing these to the world doesn’t serve democracy, but rather inhibits the practice of diplomacy. As to the revelations, much of this has been discussed publicly already. Of course much of this wasn’t on the front page of the NYT because the minutia of foreign policy communications don’t sustain readers’ interests for very long.

    Regardless of the value that disclosure of these messages had to inform the public of the communications of the State Department, the downside of disclosure is that it hurts people who were trying to help, and it undermines the effectiveness of peaceful foreign policy. That is why I consider the act of disclosing this information immoral and stupid. And that’s why I don’t want to see researchers using these documents.

Comments are closed.