In search of data


Having seen the recent news of gun-toting protesters at health reform meetings, I got into a discussion with my wife about gun control, and you know where that can lead. Yes, that’s right, to exploratory search. I had some hypotheses about the relationship between gun control and crime, and wanted to find some data to test them. I needed to find some crime statistics by state, and to cross-reference it with some aspects of states, including the degree of urbanization, population density, laws, etc. While I thought the odds of finding a canned analysis of my hypotheses was small given the amount of time I was willing to devote to the problem, I did try a few obvious queries. No luck.

It then occurred to me that perhaps this was exactly the sort of thing for which Wolfram|Alpha was designed: the ability to form complex queries over structured data. So I tried some queries there:

  • burglary rate: Wolfram|Alpha isn’t sure what to do with your input.
  • burglary: gave a thesaural entry, and suggested that it rhymes with ‘cursory’
  • crime: Assuming “crime” is a unit | Use as referring to physical quantities or a word instead

This was clearly going nowhere. Then I tried searching using Google Trends (which is conveniently located on the second page of the Google Labs list which in turn is found in the more/even more/labs menu of iGoogle;  I  should have searched for it, but couldn’t remember the name). A search for ‘crime statistics‘ yielded some miscellaneous charts, but the data was hard to understand, and impossible to refine.  It did, however, suggest that ‘egov’ might be a search useful search term.

That search led me to the  Office of E-Government and Information Technology, but that proved to be something of a dead end. I remembered seeing links in my twitter feed to all sorts of data that the government now publishes, but had no good way to search for them. Instead, I tried ‘government statistics’ as a search, which showed, as the first result, a link to, with some additional links, including ‘crime‘. (For the record, FedStats is “Celebrating over 10 years of making statistics from more than 100 agencies available to citizens everywhere.”) From here I followed the ‘firearms and crime’ link and found a partial answer to the data I needed for hypothesis testing. I found some (but not anywhere near all) other pieces of information on the NRA site, and was able to mess around with some rudimentary analyses in Excel. Of course to answer my questions well (if not definitively), I would need more data, and a more nuanced analysis, but it was getting late. I am sure this sort of analysis has been done before, and may make for interesting reading, but I am not sure how to go about finding it. Perhaps knowing where such research results are likely to be published would be a good start.

In the end, all I learned was that if you want to test hypotheses about complex data, you’re likely to need to build the dataset yourself. But it would be interesting to contemplate a world of journalism in which reporters back up their stories with references to statistical analyses that can be used in an exploratory fashion. Did the story about topic X make claims that are are true, only partially true, or demonstrably false, given some poking around with the data? It would be great if we could add statistics to the current set of reliable sources used by the media.