Blog Archive: 2011

The curious case of the software patent

on Comments (2)

Critiques of software patents is all the rage lately, from bloggers like Daniel Tunkelang to the NPR. The list of problems with them includes that they stifle innovation, that they are tools to beat up small companies and startups, and that they are simply trading cards that big corporations use to protect each other at everyone else’s expense. So why are software patents different from other patents? Why aren’t people arguing about scrapping the patent system entirely?

Last week I had the opportunity to attend a debate-style talk featuring Bob Zeidman (pro) and Prof. Edward A. Lee (con) about software patents hosted by the Computer History Museum, which I found quite helpful in understanding the issues. The motion under consideration was “Software patents encourage innovation.”

Continue Reading

News from The USPTO

on Comments (1)

I had an interesting an informative (if internet-free) day at the PaIR workshop at CIKM today. One the highlights was a keynote by Marti Hearst, who is currently the Chief IT Strategist for the USPTO. She outlined many improvements to the the USPTO IT infrastructure that are in the works, scheduled for rollout some time in 2013.

It was interesting to hear the details of the user-centered design process that she is orchestrating to understand the limitations of the existing tools and to guide the redesign with input from patent examiners and supervisors. Some of the planned improvements include a unified interface to various functions that are currently not well-integrated, automated suggestions for queries and terms of art for applications being reviewed, the ability to tag, annotate and share annotations on all sorts of documents, the ability to search over all material (including the annotations), etc.

Continue Reading

A concept by any name

on Comments (1)

Miles Efron wrote about a research project he is starting on statistical processing of 17th and 18th century English texts with the goal of establishing similarities between passages written with different spelling and vocabulary. This is a problem that humanities scholars might have when applying modern information retrieval tools to historical texts, as accepted English spelling and vocabulary was considerably more varied that it is now. (For a fun read about some of the issues, see Bill Bryson’s The Mother Tongue on the history of the English language.)

Continue Reading

Parsing patents, take 2

on Comments (8)

Working on parsing and indexing the patent collection that Google made available has been an interesting education in just how noisy allegedly clean data really is, and in the scale of the collection. I am by no means done; in fact, I’ve had to start over a couple of times. I have learned a few things so far, in addition to my earlier observations.

Continue Reading

Parsing patents

on Comments (5)

Since Google announced its distribution of patents, I have been poking around the data trying to understand what’s in there and starting to index it for retrieval. The first challenge I’ve had to deal with is data formats. The second is how to display documents to users efficiently.

The full text of the patents is available in ZIP files, one file per week, based on the date patents were granted. The files cover patents issued from 1976 to (as of this writing) the first week of 2010. In addition to the text, they contain all manner of metadata such as when the patent was filed, who the inventors and assignees were, etc. Interestingly, the zipped up files are in two different formats: patents from 2001 on are in XML, while earlier ones are in a funky ad hoc text format.

Continue Reading