Every once in a while a Twitter query turns up something completely unexpected. I suppose that’s one reason for having them. My query on all things PubMed recently turned up the following gem: a blog entitled PubMed Search Strategies. What is it? A list of queries. What? PubMed Queries, in all the Boolean glory. The latest pair of posts are pharmacoepidemiology — keywords, and its paternal twin, pharmacoepidemiology — MeSH. The queries run for 39 and 13 terms, respectively. No average 2.3 word Web searches these.
So what are they for? They appear to be by-products of medical searches designed to characterize specific concepts in various ways. They are shared for the same reason that people share open-source code: to benefit others and to build on the work of others.
From an information-seeking perspective, I find these queries fascinating. The first one yielded 963,962 hits, of which 129,619 were review articles. By default, PubMed sorts search results by date of publication (latest first), or you can arrange these in other useful orders such as the alpha by first author, by last author, by journal or by title. I cannot imagine beginning to make sense of this list. (For the record, the second set of results produces a mere 325,835, with 54,071 review articles.) Precision-recall tradeoff indeed.
So, it appears that like much open source code, these queries are mere components rather than finished tools that must be used in combination with other expressions to create useful queries. Each is like a complex facet that must be combined in some ruthless manner with other expressions to reduce these hundreds of thousands of results to a more human scale.
I wonder, though, if these expressions couldn’t be used for other purposes as well. There appear to be 49 posts on the site to date, mostly pairs of MeSH and keyword queries. It might be interesting to compare the lists to analyze which documents are retrieved by both queries, and which by only one. It seems that their ultimate effectiveness is predicated on combinations with other topics, but which variant — the keyword, the MeSH, or perhaps a hybrid — offers the best performance? What factors affect this performance? Would such queries be more useful with a best-match search engine (rather than a Boolean one), or would the number of terms make such computations prohibitively expensive for interactive use?
At this point, I don’t have the domain expertise to construct or evaluate such queries, but it would be interesting to work with someone to perform these experiments. Perhaps some light could be shed on the effectiveness of MeSH queries for medical information seeking.