When is one>two and seven==eight?

So Google recently released the Google books N-gram viewer along with the datasets.

There’s been plenty of press about it, and the Science article based on this data is an interesting read.

I was trying to come up with a simple, yet insightful query. My initial trial was modernism,postmodernism which immediately had me wondering about hyphenation or the lack thereof…  In any case, the upshot seems to be that the use of the term postmodernism started 1978ish. Neat, though I think I won’t need to clear space for my Nobel Prize anytime soon.

I toyed a little bit with other terms like generation X which has an odd sort of bump in the graph around 1970. Not sure what’s up with that, though perhaps there’s some data collection artifacting as discussed in this article.  I wasn’t inclined to deep end on this and was happy enough to have my prior knowledge confirmed by noting that the use of “generation X” took off in the mid 1990’s.

My final trial was a bit more on the minimal side: one,two,three,four,five,six,seven,eight,nine,ten. There shouldn’t be any surprise here that “one” is more common than “two” is more common than “three”, is more common than “four”. It probably shouldn’t be a surprise that each succeeding number is less frequent by roughly a factor of 2.

Less intuitive (to me anyway) is that “ten” squeezes in front of “seven” and “eight” (OK, so maybe it’s a round number), “seven” and “eight” are basically tied, but even more odd is that before 1790 or so, the putative occurrence of “six” and “seven” were virtually non-existent.

Turns out it appears to be the same issue with the “medial S” that Danny Sullivan describes in greater detail in his post. In other words, it’s an artifact of OCR and an indication of the evolution of typography rather than the evolution of language.

One mystery solved; now why are “seven” and “eight” tied in frequency?

Kudos to Google for releasing the viewer and data.

Active capture at ACM MM 2010


FXPAL has a few papers appearing at the upcoming ACM Multimedia Conference in Firenze, Italy.  Among them is NudgeCam, which was recently featured in an article on MIT’s Technology Review as noted previously on this very blog.

NudgeCam is an experiment in “active capture”. Media capture (in this case, photos and videos) is enhanced by providing a template of elements to capture and also real-time interactive tips to aid the quality of each shot or clip.  The template allows the author to insure that essential story components are captured, and the realtime feedback helps insure that the parts are of high quality. Together the creation of high quality result is streamlined.

The author, Scott Carter, will be presenting this work on Tuesday, October 26th in Session S1 at ACM Multimedia in Firenze, Italy.

See you there!

FXPALer making the blogs

Andreas Girgensohn

Our very own Andreas Girgensohn recently returned from WWW 2009 in Madrid, where he presented a in the developers’ track on efficient web-browser sharing, and a co-taught a tutorial with Alison Lee on developing mobile web applications.

In the aftermath of his appearance he’s popped up in the blogosphere being quoted about the significance of the iPhone as a mobile web platform in an article along with other luminaries such as Vinton Cerf and Tim Berners Lee. (link)

Keeping good company, Andreas!

Creating an iAbbreviation: SFMI


Fairly recently I became one of those iPhone types. You know the ones – gaze ever downwards, fingers poised to pinch or pick or tap-tap.  I love the thing, though I’m not sure I love what I’ve become with it. Continue Reading