The number of third-party tools for searching PubMed data seems to be increasing recently. As the NLM is about to roll out a new search interface, companies are starting to offer alternative interfaces for searching this important collection. The attraction is obvious: a large, motivated group of searchers, an important information need, and a manageable collection size. A decade ago, over 20 million searches were done monthly through the NLM site, and the numbers are surely higher today; the collection is large but not huge — currently over 17 million entries (some with full text), occupying somewhat more than 60GB of disk space. Thus we see an increasing number of sites offering search over this collection, including PubGet, GoPubMed, TexMed, and HubMed. The offerings range from basic to flashy, and appear to be aiming at different groups of searchers.
PubGet, for example, attempts to simplify access to the full text of articles rather than showing just abstracts. (See this article for a good review of its strengths and weaknesses.) GoPubMed has a slick interface that offers a limited form of faceted search through which a query can be refined, but the interface is inconsistent (multiple aspects cannot be combined). TexMed offers a minimalistic interface with the hook that bibliographic references can be downloaded easily for the retrieved documents. HubMed is another interactive web site that can display abstracts inline with search results, can export search results like TexMed, and has links to a range of other search tools for each article. Because searches often retrieve many documents, these most of these sites (include the NLM PubMed interface) offer a way to save documents persistently, although not all of them allow search results from different searches to be segregated.
While I will not attempt a thorough analysis of search effectiveness of these tools, I found that the tools offer radically different result sets for the one query (“colon cancer survival rates”) I tried. As a reference, PubMed produced 4093 matches after translating my query into a somewhat more complicated expression. HubMed also returned 4093 hits, but did not give an explanation for its ordering of the results. GoPubMed seems to have returned the same results (in the same order) as PubMed, while PubGet offered 3,994 results with no obvious cues as to how it managed this feat. Finally, TexMed returned 250 documents. While these interfaces show considerable attention to interface design, it is not clear from casual inspection whether they will actually improve the effectiveness of serious medical search. Certainly none of them wins any awards for transparency in search, a characteristic that is important for recall-oriented searching practiced by medical reference librarians.
Update: I forgot to mention Hakia, a web site that can search a range of collections, including PubMed. Each match to the query shows an in-line snippet with the matching phrase highlighted, but does not show abstracts inline, requiring the searcher to click on a link that loads the matching document entry in PubMed. Unfortunately, it does not report the total number of documents retrieved or the order in which they are presented. Counting manually, I identified 186 matching documents.