Blog Category: Computer Science

Ego-Centric vs. Exo-Centric Tracking and Interaction in Smart Spaces

on Comments (0)

In the recent paper published at SUI 2014,”Exploring Gestural Interaction in Smart Spaces using Head-Mounted Devices with Ego-Centric Sensing”, co-authored with Barry Kollee and Tony Dunnigan, we studied a prototype Head Mounted Device (HMD) that allows the interaction with external displays by input through spatial gestures.

In the paper, one of our goals was to expand the scope of interaction possibilities on HMDs, which are currently severely limited, if we consider Google Glass as a baseline. Glass only has a small touch pad, which is placed at an awkward position on the devices rim, at the user’s temple. The other input modalities Glass offers are eye blink input and voice recognition. While eye blink can be effective as a binary input mechanism, in many situations it is rather limited and could be considered socially awkward. Voice input suffers from recognition errors for non-native speakers of the input language and has considerable lag, as current Android-based devices, such as Google Glass, perform text-to-speech in the cloud. These problems were also observed in the main study of our paper.

We thus proposed three gestural selection techniques in order to extend the input capabilities of HMDs: (1) a head nod gesture, (2) a hand movement gesture and (3) a hand grasping gesture.

The following mock-up video shows the three proposed gestures used in a scenario depicting a material selection session in a (hypothetical) smart space used by architects:

EgoSense: Gestural Interaction in Smart Spaces using Head Mounted Devices with Ego-Centric Sensing from FX Palo Alto Laboratory on Vimeo.

We discounted the head nod gesture after a preliminary study showed a low user preference for such an input method. In a main study, we found that the two gestural techniques achieved performance similar to a baseline technique using the touch pad on Google Glass. However, we hypothesize that the spatial gestural techniques using direct manipulation may outperform the touch pad for larger numbers of selectable targets (in our study we had 12 targets in total), as secondary GUI navigation activities (i.e., scrolling a list view) are not required when using gestures.

In the paper, we also present some possibilities for ad-hoc control of large displays and automated indoor systems:

Ambient light control using spatial gestures tracked by via an HMD.

Ambient light control using spatial gestures tracked by via an HMD.

Considering the larger picture, our paper touches on the broader question of ego-centric vs exo-centric tracking: past work in smart spaces has mainly relied on external (exo-centric) tracking techniques, e.g., using depth sensors such as the Kinect for user tracking and interaction. As wearable devices get increasingly powerful and as depth sensor technology shrinks, it may, in the future, become more practical to users to bring their own sensors to a smart space. This has advantages in scalability: more users can be tracked in larger spaces, without additional investments in fixed tracking systems. Also, a larger number of spaces can be made interactive, as the users carry their sensing equipment from place to place.

Improving the Expressiveness of Touch Input

on

Touch input is now the preferred input method on mobile devices such as smartphones or tablets. Touch is also gaining traction in the desktop segment and is also common for interaction with large table or wall-based displays. At present, the majority of touch displays can detect solely the touch location of a user input. Some capacitive touch screens can also report the contact area of a touch, but usually, no further information about individual touch inputs is available to developers of mobile applications.

It would, however, be beneficial to capture further properties of the user’s touch, for instance the finger’s rotation around the vertical axis (i.e., the axis orthogonal to the plane of the touch screen) as well as its tilt (see images above). Obtaining rotation and tilt information for a touch would allow for expressive localized input gestures as well as new types of on-screen widgets that make use of the additional local input degrees of freedom.

Having finger pose information together with touches adds additional local degrees of freedom of input for each touch location. This, for instance, allows the user interface designer to remap established multi-touch gestures such as pinch-to-zoom to other user interface functions or to free up screen space by allowing input (e.g., adjusting a slider value, scrolling a list, panning a map view, enlarging a picture) to be performed at a single touch location that usually need (multi-) touch gestures that require a significant amount of screen space. New graphical user interface widgets that make use of finger pose information, such as rolling context menus, hidden flaps or occlusion-aware widgets have also been suggested.

Our PointPose prototype performs finger pose estimation at the location of touch using a short-range depth sensor viewing the touch screen of a mobile device. We use the point cloud generated by the depth sensor for finger pose estimation. PointPose estimates the finger pose of a user touch by fitting a cylindrical model to the subset of the point that corresponds to the user’s finger. We use the spatial location of the user’s touch to seed the search for the subset of the point cloud representing the user’s finger.

One advantage of our approach is that it does not require complex external tracking hardware (as in related work), and external computation is unnecessary as the finger pose extraction algorithm is efficient enough to run directly on the mobile device. This makes PointPose ideal for prototyping and developing novel mobile user interfaces that use finger pose estimation.

Want to help make computer science history?

on Comments (2)

Scott Aaronson has been asked by MIT to put together a list of the top 150 events in computer science history as part of the celebration of MIT’s 150th anniversary. You can vote on the potential entries here (you will need to register by providing a login name, password, and e-mail address). For more information about the project, see this blog post which includes an early version of the list, and a more recent blog post of his on the subject.

I’ve mentioned some of Scott’s work before, in a post about classical computer science results inspired by quantum information processing, and in a post on an overview of  quantum computing for technology managers I wrote a couple of years ago. His results don’t make it into the top 150 computer science results of all time, but are good candidates for a list of the top 150 results of the last decade.

A magical way to learn computer science

on Comments (1)

Former FXPAL intern Jeremy Kubica’s Computational Fairy Tales is a fresh new entry into the blogosphere that introduces a unusual way to learn computer science: read a series of charming fairy tales. Each post contains a few sentences of introduction to a computer science concept followed by a fairy tale illustrating that concept.

I particularly enjoyed Loops and Making Horseshoes which illustrates Continue Reading

Bell Systems Technical Journal online

on

AT&T Bell Labs has recently made their entire archive of the Bell Systems Technical Journal (BSTJ) available for free on-line. The collection goes all the way back to 1922. In fact, the first issue has an article on the transmission characteristics of the submarine cable. For example, in 1978 an entire issue of the journal was dedicated to a new operating system called Unix.

Continue Reading

Overflow overflow?

on

Ten days ago,  a theoretical computer science community Q&A site went beta and seems to be generating a fair amount of activity. I’m a big fan of MathOverflow, and am delighted to see a similar site springing up for a different field.

Thirty-nine days ago,  a new mathematics site went beta, which initially puzzled me since the mathematics community already has the highly successful MathOverflow site. The difference appears to be that MathOverflow is specifically for research mathematics whereas the new site aims to be broader, allowing more elementary questions.

Overall, I think a proliferation of such sites is great, but it is also confusing. It isn’t always clear when a question is research level or not. There are questions tagged algebra or topology on the CS theory site that are pure mathematics questions. There’s a question tagged  graph theory that had been posted previously to MathOverflow. I am delighted to see that both cs.cr.crypto-security and quantum computing already are populated with a few questions, but similar questions in these areas received good answers on MathOverflow. It would be a shame if the proliferation of sites lead to less interaction between fields rather than more. I’ll be curious to see how the usage patterns play out over time.

Proof?

on Comments (3)

For those of us with a passing (or greater) interest in algorithms, last week was particularly interesting: Vinay Deolalikar circulated a paper that attempted to prove P≠NP. This is one of the great unsolved problems in Computer Science, and its solution has some important implications for real-world problems such as keeping your money in your bank account.

I won’t attempt a summary of the proof, and will limit myself to social commentary.

Continue Reading

ai

on Comments (1)

Artificial intelligence has always struck me as a fittingly modest name, as I emphasize the artifice over the intelligence. Watson, a question-answering system has recently been playing Jeopardy against humans to test the “DeepQA hypothesis”:

The DeepQA hypothesis is that by complementing classic knowledge-based approaches with recent advances in NLP, Information Retrieval, and Machine Learning to interpret and reason over huge volumes of widely accessible naturally encoded knowledge (or “unstructured knowledge”) we can build effective and adaptable open-domain QA systems. While they may not be able to formally prove an answer is correct in purely logical terms, they can build confidence based on a combination of reasoning methods that operate directly on a combination of the raw natural language, automatically extracted entities, relations and available structured and semi-structured knowledge available from for example the Semantic Web.

As a researcher, I’m excited at the milestone this represents.

Continue Reading