Blog Archive: 2010

Parsing patents

on Comments (5)

Since Google announced its distribution of patents, I have been poking around the data trying to understand what’s in there and starting to index it for retrieval. The first challenge I’ve had to deal with is data formats. The second is how to display documents to users efficiently.

The full text of the patents is available in ZIP files, one file per week, based on the date patents were granted. The files cover patents issued from 1976 to (as of this writing) the first week of 2010. In addition to the text, they contain all manner of metadata such as when the patent was filed, who the inventors and assignees were, etc. Interestingly, the zipped up files are in two different formats: patents from 2001 on are in XML, while earlier ones are in a funky ad hoc text format.

Continue Reading

Intended to deceive

on Comments (2)

The ‘sphere is a-twitter about BP’s buying keywords (e.g., “oil spill”, “BP”, “gulf disaster”, etc.) to place links to their versions of the story at the top of the search results.  ABC News writes:

According to Kevin Ryan, the CEO of California-based Motivity Marketing, research shows that most people can’t tell the difference between a paid result pages, like the ones BP have, and actual news pages.

So we have two issues: one related to BP, and one related to the search engines.

Continue Reading

#Google #search for #Twitter? #fail!

on Comments (9)

For a while now, Google has been serving up tweets related to searches as part of its real-time search effort. Now they are making it possible to search the Twitter stream in exactly the way Twitter doesn’t allow — that is, to search for tweets older than a few days. A query like


will return a bunch of tweets, formatted as Google search results. As of the time I ran this query, it identified 1,380 hits from Twitter. Twitter’s search yielded about 250 tweets, going back to no more than 10 days ago. So far, so good.

Continue Reading

The web browser evolution

on Comments (3)

Just when you thought browser wars were a thing of the past, here comes Google Chrome. In a bid to increase its browser’s market penetration, Google announced Quick Scroll, a Chrome extension that enhances Google’s search results by highlighting matching passages that may not be easy to find otherwise.

Continue Reading

First squares, now circles

on Comments (6)

A while ago, Google introduced Google Squared, an attempt to help people keep track of different aspects in their search results. I think that it’s an interesting HCIR idea that still lacks a good implementation, as I’ve written here and here. Recently, Google introduced a means of adding results informed by the searcher’s social network, which Google has dubbed “Social Circle.” I spent some time playing with it, and found it lacking.

Continue Reading

Google Squared: any sign of progress?

on Comments (3)

At Daniel Tunkelang’s suggestion, I revisited Google Squared, having written about it when it was first released. At the time, I tried a couple of queries (not a formal evaluation), and found some useful results, and some bogus ones. This time, I re-ran the same queries as before, and compared the results with my saved queries. For the query ‘airplane accidents’, the new results were considerably worse. For the query ‘acts of terrorism’, there were no initial results, but when I put in some instances (WTC attack, Oklahoma City bombing, Khobar towers, marine barracks) I got back a similar list to the one I had constructed in June.

Continue Reading

Ode to Google Wave


OK, it’s a sonnet, not an ode, but still. Making Light is one of my favorite blogs, run by science fiction editors Teresa and Patrick Nielsen Hayden; it has a rich subject range and a great community of commenters. I also enjoy its commenters’ tendency to break into verse at the least provocation. Google Wave (which Jeremy discussed here) was the topic of a recent post titled “Panhandling for invites” in which Abi Sutherland offers this delight:

The sea has depths in which no net is cast,
With trackless kelpine forests where great squid,
Like Sasquatch in his mountains safely hid,
Dance dreaming with the fishes swimming past.
And human interaction is the same.
Beneath an email surface lies the deep:
Unmodeled work and social patterns creep
And spread in ways existing tools don’t frame.

Go here to see the whole sonnet.

Data Liberation: What do you Own?

on Comments (3)

Recently Google announced a new initiative: The Data Liberation Front:

The Data Liberation Front is an engineering team at Google whose singular goal is to make it easier for users to move their data in and out of Google products.  We do this because we believe that you should be able to export any data that you create in (or import into) a product. We help and consult other engineering teams within Google on how to “liberate” their products.  This is our mission statement: Users should be able to control the data they store any of Google’s products. Our team’s goal is to make it easier for them to move data in and out.

This is a fantastically worthy goal, and I whole-heartedly applaud it.  However, I am beginning to wonder: What data is yours to own, in the first place?

For example, consider web searching.  Continue Reading

The Library of Google

on Comments (1)

In “The Library of Babel“, Jorge Luis Borges describes a library “…composed of an indefinite, perhaps an infinite, number of hexagonal galleries… ” lined with shelves of books. Unfortunately, the books are not organized in any predictable manner, causing librarians to travel “… in search of a book, perhaps of the catalogue of catalogues…” The searches, though, are in vain, given the improbability of finding what you seek in an infinite collection.

Continue Reading