Blog Archive: 2013

Details, please

on Comments (3)

At a PARC Forum a few years ago, I heard Marissa Mayer mention the work they did at Google to pick just the right shade of blue for link anchors to maximize click-through rates. It was an interesting, if somewhat bizarre, finding that shed more light on Google’s cognitive processes than on human ones. I suppose this stuff only really matters when you’re operating at Google scale, but normally the effect, even if statistically-significant, is practically meaningless. But I digress.

I am writing a paper in which I would like to cite this work. Where do I find it? I tried a few obvious searches in the ACM DL and found nothing. I searched in Google Scholar, and I believe I found a book chapter that cited a Guardian article from 2009, which mentioned this work. But that was last night, and today I cannot re-find that book chapter, either by searching or by examining my browsing history. The Guardian article is still open in a tab, so I am pretty sure I didn’t dream up the episode, but it is somewhat disconcerting that I cannot retrace my steps.

Continue Reading

When is one>two and seven==eight?

on Comments (1)

So Google recently released the Google books N-gram viewer along with the datasets.

There’s been plenty of press about it, and the Science article based on this data is an interesting read.

I was trying to come up with a simple, yet insightful query. My initial trial was modernism,postmodernism which immediately had me wondering about hyphenation or the lack thereof…  In any case, the upshot seems to be that the use of the term postmodernism started 1978ish. Neat, though I think I won’t need to clear space for my Nobel Prize anytime soon.

I toyed a little bit with other terms like generation X which has an odd sort of bump in the graph around 1970. Not sure what’s up with that, though perhaps there’s some data collection artifacting as discussed in this article.  I wasn’t inclined to deep end on this and was happy enough to have my prior knowledge confirmed by noting that the use of “generation X” took off in the mid 1990’s.

My final trial was a bit more on the minimal side: one,two,three,four,five,six,seven,eight,nine,ten. There shouldn’t be any surprise here that “one” is more common than “two” is more common than “three”, is more common than “four”. It probably shouldn’t be a surprise that each succeeding number is less frequent by roughly a factor of 2.

Occurence of numbers in google books N-gram viewer

Google books n-gram viewer for numbers

Less intuitive (to me anyway) is that “ten” squeezes in front of “seven” and “eight” (OK, so maybe it’s a round number), “seven” and “eight” are basically tied, but even more odd is that before 1790 or so, the putative occurrence of “six” and “seven” were virtually non-existent.

Detail on number occurrences

Turns out it appears to be the same issue with the “medial S” that Danny Sullivan describes in greater detail in his post. In other words, it’s an artifact of OCR and an indication of the evolution of typography rather than the evolution of language.

One mystery solved; now why are “seven” and “eight” tied in frequency?

Kudos to Google for releasing the viewer and data.

Google eBooks

on Comments (2)

So Google has unveiled its eBook store, setting itself up to compete with Amazon, Barnes&Noble, and everyone else selling books. Google offers its editions through the browser and on a range of devices such as Android phones and the iPad. The reading experience on the browser on my laptop was OK: not great, but the text was legible enough, and would even switch to a two-page layout in a wide window. On the iPad, Google offers two choices: the browser, and a free app. The browser interface implements a swipe gesture for page turning, although there is no visible indication that it’s possible, nor any visual feedback until the page flips. The iPad app sports an animated page turning transition, but does not have a two-page mode.

Continue Reading

Revealing details

on Comments (4)

Thanks to Mor Namaan, I came across an interesting blog post by Justin O’Beirne that analyzed the graphic design of several different maps — Google, Bing, and Yahoo — to show why Google maps tend appear easier to read and to use. The gist of the analysis is that legibility is improved through a number of graphical techniques that in combination produce a significant visual effect.

And of course knowing Google, this stuff was tested and tested and tested to get the right margins around text, the right gray scale for the labels, the right label density, etc.

So why did Justin have to reverse-engineer this work to understand it?

Continue Reading


on Comments (7)

Those of you who’ve followed this blog and Jeremy Pickens’ blog will recall his many comments about Google’s un-Googly behavior. Recently, Benjamin Edelman actually tested the hypothesis about Google injecting bias into organic results. His post details several kinds of queries that don’t produce organic results. Which ones? Ones that are related to Google properties such as finance, health, and travel. While it’s clear why Google pushes its own properties, it seems that this behavior is inconsistent with the image it tries to project.

Continue Reading

Instant success?

on Comments (10)

So I fired up IE-8 and I tried Google Instant. It’s fast: as fast as I can type, it’s showing me search results. Mind you the results aren’t always sensible, but they are delivered quickly. It works great for short queries such as looking for a popular sense of some word. In this case, it saves me the trouble of hitting enter. Nice, but not earth-shattering.

When I am looking for something less obvious, it guesses wrong. For example, the query “information processing and management” (an academic journal) first produced a set of results for the partial string “”inform” that match Nice, but not the journal. After I typed “information,” it showed me the wikipedia page for “information” (oh the irony) and a bunch of other links highly-associated with the term. But no journal. “information proc” produced a bunch of hits on “information processing.” Better, but not what I am after. Completing the second word and pressing the space bar yielded a number of links to “information processing theory,” which also happens to be the top query suggestion. But no journal. Only when I typed “information processing and” did I get the results I wanted.

So what are we to make of this new addition to Google’s bag of tricks?

Continue Reading

Boolean illogic

on Comments (5)

I am trying to understand how Google patent search works, and am encountering some quite odd behavior. I am not talking about the inventor search bug (which is still un-fixed), but about Boolean logic.

If I run the query [“information retrieval”], the system retrieves 323 documents. Similarly, [“dynamic hypertext”] retrieves 368 documents. The combination, [“information retrieval” “dynamic hypertext”] yields 16. Putting a plus in front of either quoted phrase does not affect the results. So far, this seems reasonable.

Continue Reading

Parsing patents, take 2

on Comments (8)

Working on parsing and indexing the patent collection that Google made available has been an interesting education in just how noisy allegedly clean data really is, and in the scale of the collection. I am by no means done; in fact, I’ve had to start over a couple of times. I have learned a few things so far, in addition to my earlier observations.

Continue Reading

Google, Microsoft, Lunch

on Comments (1)

This post was co-authored with Jeremy Pickens

The RescueTime blog, in a piece titled Google is eating Microsoft’s lunch, one tasty bite at a time, showed a comparative usage analysis between Microsoft Office tools and various Google tools such as Gmail, Google Docs, etc. Based on an analysis of their several hundred thousand users, they claim that the use of Microsoft tools had declined whereas the use of Google tools increased.

There are a bunch of problems with this analysis.

Continue Reading

Google’s Patent Search “feature”

on Comments (1)

While poking around on the USPTO and Google to try to figure out how to get single PDF documents for my indexing project, I discovered that the Google advanced search interface won’t retrieve any documents based on the inventor field. I run the searches three ways: by typing an author’s name into the Google patent search box, by typing it into the advanced search form on Google, and by entering it into the USPTO’s advanced search form. I expect the first set of results to be the largest as it may include hits where the inventor is referenced by some other patent, but the second two should return the same number of hits. The results for a couple of searches are shown below; you can run your vanity search yourself.

Inventor Google Google
Gene Golovchinsky 41 0 21
Andreas Girgensohn 52 0 29
Daniel Tunkelang 9 0 8

I don’t know if this is a metadata problem (along the lines of the Google books metadata issues that came up in the context of Google Books), or if it is a UI/front end issue. In any case, it seems odd that testing didn’t catch this bug.