Common nonsense


Pattie Maes from the MIT Media lab gave a TED talk this year about technology developed by one of her students, Pranav Mistry. The basic idea is that a person wears a device around the neck consisting of a camera, a projector, a small computer, all with wireless (cell phone) connectivity to the network. She argues that by augmenting our perceptions with information can give us a “sixth sense” that will help us lead more productive and fulfilling lives.

The talk has prompted some positive and negative reactions in the blogosphere. The naive optimist view (e.g., here) is that this is a small step toward more capabale (and perhaps even enlightened) humanity. A less credulous response by Andy Rutledge can be found here. While I share Andy’s view for the most part, I would like to critique the work per se without examining the motivations of its creators.

The talk describes the following scenarios for the device.

  1. Several scenarios involve projecting information onto some arbitrary surface, and then using gestures to manipulate the display. There is some cleverness here, but as Occam would point out, there are easier ways to tell time or dial a cell phone. The technique doesn’t seem practical for sustained use, as your hands have to be extended in front of you to be visible to the camera.
  2. Taking pictures by framing is compelling, although it does require some extra hardware to preview the image.
  3. Getting additional information about a product (based on its bar code, for example) is useful. But one can just snap a picture of it with a cell phone to obtain the same kinds of information. There is nothing particularly useful about projecting it onto the product. It’s a clever hack but the same info can be found through an iPhone app such as or ScanLife, which have the added advantage of actually working in the real world. Furthermore, taking a picture of the item makes it possible to reuse that information later. A related example involves augmenting a book with online information about it, including its Amazon rating, reviews, and annotations.
  4. Watching video associated with a newspaper article is cool, but it is not clear how reliable the association is. There is also the issue of privacy: do you want people nearby seeing what you’re watching? Do they want to be distracted by your flickering images? And of course there are unsolved technical issues around contract, brightness, and stability of the image given that both you and the newspaper are moving, and that the newspaper deforms in strange ways.
  5. There are more controversial applications that are in some sense (pun intended) more central to the vision of the device: when meeting a person, the system can allegedly show you some keywords that characterize the person to help you interact with him or her. This form of interaction is problematic for several reasons: first, the social: would you want to have stuff projected onto you by some stranger, as that person gazes distractedly at your belly or your breasts? A conference badge is bad enough, but at least there isn’t much to read there. As Andy Rutledge points out, by looking at this data of dubious relevance, do you then miss out on more reliable cues that the person is providing, right there, in front of your nose? Then the technical: how does the system know at whom you’re looking? How do you know how accurate the information being displayed is? Rather than being a crutch to help conversation along, this kind of device is much more likely to kill the conversation to begin with.

The remaining issue I would like to address is one of interaction, of expressing intent and receiving predictable and timely responses from the system. Assume for a minute that all the technological stuff works. Assume that the data the system can retrieve are correct and relevant. How does the system know what you want or need to know? How does it know you’re looking at the airplane ticket rather than at the taxi driver or the outside scene? How does it know you’re interested in watching the stupid video associated with a newspaper article when you’re not even reading that one? How does it know you want to dial a phone? Perhaps you want a calculator instead. How does it know you want to take a photograph, as opposed to, for example, waving your arms around during a conversation (as some of us are incapable of not doing)? How does it know you want to see the time as opposed to scratching an itch on your wrist?

While the display capabilities shown in this demo are potentially useful and practical, the associated modality switches and command sets (gestures, button presses, utterances, etc.) make error-free interaction with such a system virtually impossible. To be useful, augmentation has to be natural and effortless: false positive and false negative errors will make the system unpredictable and frustrating to use, and comical to watch. The problem isn’t with sufficiently advanced technology. The problem is with the inherent ambiguity of the human existence. This is a difficult lesson for technologists to accept, and yet it must be accounted for to allow technology to be empowering rather than dehumanizing.