Blog Category: Research

Prototyping reality

on Comments (0)

One of our ongoing goals at the lab is to understand how best to take advantage Augmented Reality (AR) to annotate physical objects with digital media. Unfortunately, the objects we tend to focus on (such as mutli-function devices or printers) are often large and relatively immobile, making it difficult for us to visit remote sites to demonstrate our technologies.

To address this problem, we are experimenting with paper-based models of the physical objects we want to augment, which are much more lightweight and mobile while still approximating the embodied experience of a 3D device (see Figure 1). In order to register the paper-based models with AR tracking systems, we can either scan the entire paper-based object or, if the object corresponds to a cube or rectangular box, we can register each side as independent images (images may in fact correspond to registration images used in the actual scene). In either case, this paper-based object is mobile and easily reconfigurable, giving us much more flexibility in how, when, and where we present AR content (Figure 2).

Figure 1. Our printer paper prototype.
Figure 2. Viewing digital content affixed to the paper printer prototype with a mobile AR tool.

This approach represents somewhat of an inversion of typical paper-based prototyping methods in which user-interface elements are prototyped rather than the physical object against which they are registered (which do not exist for most 2D interfaces). Marc Rettig introduced lo-fi prototyping with paper UI elements in his influential paper Prototyping for Tiny Fingers , and this method was adopted rapidly throughout the user experience community. Recently, researchers have extended it to AR scenarios as well.

PapAR was one of the first to adapt paper prototyping techniques to AR for head-mounted displays. It is a straightforward design that involves a base layer with real-world elements drawn in paper similar to a typical paper prototype as well as a transparent overlay onto which are draw AR interactors. This is a simple and elegant “glass pane” approach familiar to user experience professionals.

Figure 3. In PapAR, authors move a transparent AR overlay over a sketched real-world scene.

Michael Nebeling’s work at the University of Michigan School of Information pushes this concept further. Inspired by issues with an earlier AR creation toolkit (the influential DART system), Nebeling et al. first built ProtoAR, which allows AR designers to integrate 2D paper sketches as well as 3D Play-Doh mockups into a prototype AR scene. The toolkit includes a desktop and a mobile app that creators can use to scan physical objects, integrate into an AR scene, and link to real-world markers.

The researchers later extended this toolkit to allow authors to adjust the representation of AR content live, facilitating Wizard-of-Oz style user testing (see their CHI presentation on this work).

Closer to our approach are tools that augment paper prototypes with digital resources to experiment with AR content. For example, the ARcadia system supports authoring AR-based tangible computing interfaces. In this system, content creators attach markers to paper prototypes then use a desktop tool to augment the prototypes with digital content.

We have a long tradition of using and extending lightweight prototyping methods at FXPAL. In light of recent events, we expect to focus future work on extending lightweight AR prototyping tools to support remote experimentation and design iteration.

Augmented Reality: Is this time different?

on

Ivan Sutherland’s Sword of Damocles, a head-mounted virtual and augmented reality system, was ungainly but remarkably forward-thinking. Developed over a half-century ago, the demonstration in the video below includes many of the components that we recognize today as critical to VR and AR displays, including the ability to display graphics via a headset, a positioning system, and an external computational mechanism.

Since then, AR and VR have experienced waves of hype that builds over a few years but reliably fades in disappointment. With the current excitement over consumer-level AR libraries (such as ARKit and ARCore), it is worth asking if anything is different this time.

The Augmented Connected Enterprise (ACE) team at FXPAL is betting that it is. We are currently building an AR-based remote assistance framework that combines several of our augmented reality, knowledge capture, and teleconferencing technologies. A future post will describe the engineering details of our work in more detail. Here we explore some of the problems that AR has faced in the past, and how we plan to address them.

In their paper “Drivers and Bottlenecks in the Adoption of Augmented Reality Applications” [1], Martinez et al. explored some typical pitfalls for AR technology, including No standard and little flexibility, Limited (mobile device) computational power, (Localization) inaccuracy, Social acceptance, and Amount of information (Distraction). We address each of these in turn below:

  • No standard and little flexibility
  • Limited (mobile device) computational power

Advances in contemporary technologies have largely addressed these two issues. As mentioned above, the market appears to be coalescing into two or three widely adopted libraries (specifically ARKit, ARCore, and Unity). Furthermore, limited computational power on mobile devices is a rapidly receding concern.

  • (Localization) inaccuracy

Caudell and Mizell echoed this issue in their paper introducing the term, “augmented reality” [2]. They wrote that, “position sensing technology is the ultimate limitation of AR, controlling the range and accuracy of possible applications.”

Addressing this concern involves scanning several real world objects in order to detect and track them in an AR scene. Our experiences so far reveal that, even if they aren’t yet ready for wide deployment, detection and tracking technologies have come a long way. The video below shows our procedure for scanning a 3D object with ARKit (adapted from this approach). We have found that ensuring a flat background is paramount to generating an object free of noisy background feature points. Other than that, the process is straightforward.

Scanning an object in this way generates a digital signature that our app can recognize quickly and accurately, allowing us to augment the physical object with interactive guides.

  • Social acceptance

The many issues associated with the launch of Google Glass made it clear that HMD devices are not yet acceptable to the consumer market. But our intuition is that focusing on the consumer market is inappropriate, at least initially, and that developers should instead target industrial settings (as Caudell and Mizell did at Boeing). A more appropriate metaphor for AR and VR devices (outside of their use in gaming) is a hard hat—something that you put on when you need to complete a task.

  • Amount of information (Distraction)

Martinez et al. are concerned that the “amount of information to be displayed in the augmented view may exceed the needs of the user.” This strikes us less as a bottleneck and more a design guideline—take care to make AR objects as unobtrusive as possible.

In addition to the issues above, we think there are at least two other problems standing in the way of widespread AR adoption:

  • Authoring

There are a variety of apps that can help AR content creators author scenes manually, including Amazon Sumerian, Apple Reality Composer, Adobe Aero, and ScopeAR WorkLink. However, with these tools designers still must create, import, place, and orient models, as well as organize scenes temporally. We think there are opportunities to simplify this process with automation.

  • Value

Finally, as with any technology, users will not adopt AR unless it provides value in return for their investments in time and money. Luckily, AR technologies, specifically those involving remote assistance, enjoy a clear value proposition: reduced costs and time wasted due to travel. This is why we believe the current wave of interest in AR technologies may be different. Previous advances in the quality of HMDs and tracking technologies were not met with similar increases in teleconfercing technologies and infrastructure. Now, however, robust, full media teleconferencing technologies are commonplace, making remote AR sessions more feasible.

Many tools already take advantage of a combination of AR and teleconferencing technologies. However, to truly stand in for an in-person visit, tele-work tools must facilitate a wide range of guided interaction. Experts feel they must travel to sites because they need to diagnose problems rapidly, change their point-of-view with ease to adapt to each particular situation, and experiment or interact with problems dynamically. This type of fluid action is difficult to achieve remotely when relaying commands through a local agent. In a future post, we will discuss methods we are developing to make this interaction as seamless as possible, as well as approaches for automated authoring. Stay tuned!

[1] T. P. Caudell and D. W. Mizell. “Augmented reality: An application of
heads-up display technology to manual manufacturing processes”. In
Proc. Hawaii Int’l Conf. on Systems Sciences, 659–669, 1992.

[2] Martínez, H. et al. “Drivers and Bottlenecks in the Adoption of Augmented Reality Applications”. Journal of Multimedia Theory and Application, Volume 1, 27-44, 2014.

FXPAL@ACM ISS 2019, November 10

on

FXPAL is presenting a Demo and a Poster at ISS 2019 in Daejeon, South Korea.

Demo

We propose a tabletop system with two channels that integrates document capture with a 4K video camera and hand tracking with a webcam, in which the document image and hand skeleton data are transmitted at different rates and handled by a lightweight Web browser client at remote sites.

Toward Long Distance Tabletop Hand-Document Telepresence
Chelhwon KimPatrick Chiu, Joseph de la Pena, Laurent Denoue, Jun Shingu, Yulius Tjahjadi

Poster

We present a remote assistance system that enables a remotely located expert to provide guidance using hand gestures to a customer who performs a physical task in a different location.

A Web-Based Remote Assistance System with Gravity-Aware 3D Hand Gesture Visualization
Chelhwon KimPatrick ChiuYulius Tjahjadi

Come by and check them out!

CollaboPlanner @ CSCW 2018

on

Traveling and visiting new cities is often done in pairs or groups. Besides, searching collaboratively for places of interest is a common activity that frequently occurs on individual mobile phones, or on large tourist-information displays in public places such as visitor centers or train stations.

Prior work suggests that the technologies available to travelers prevent effective collaborative trip planning. Each of these modalities has its pros and cons in terms of supporting collaborative decision-making in the tourist context: mobile phones are private and familiar, but lack screen real estate and are hard to co-reference; large public displays are bigger and can provide content from multiple sources in one place but are located in a fixed position and are more visible to others.

We created CollaboPlanner, a collaborative itinerary planning application that combines mobile interaction with a public display, and evaluated them against third-party mobile apps in a simulated travel-search task to understand how the unique features of mobile phones and large displays might be leveraged together to improve collaborative travel planning experience.

We designed CollaboPlanner to support two scenarios: creating travel itineraries with a public display application, and creating travel itineraries with the public display and mobile applications combined.

CollaboPlanner

CollaboPlanner allows users to explore destinations, add them to an itinerary, and see their itinerary visualized on an interactive map. The hybrid version of CollaboPlanner includes a dedicated mobile app. This mobile app allows users to select preferences independently and then send them to the large display for additional discussion and decision-making.

Our user tests provide initial evidence that while using mobile phones is familiar, public displays have added advantages, both as standalone tools and in combination with a mobile app to help travelers collaboratively search unfamiliar environments.

Come see our demo at 6:00PM on November 5th (Mon) to find out more about this system as well as find a restaurant in NYC!

Matthew Lee of FXPAL is also presenting ReflectLive@CSCW 2018.

FXPAL @ CHI 2018

on

A collage of figures from our CHI 2018 Papers.This year FXPAL is happy to present four papers (three long papers and one case study) at CHI 2018 in Montreal. Our featured work this year investigates the themes of Human-Centered Workstyle, Information Visualization, and Internet of Things.

You can check out the papers now:

Long Papers

Case Study

Come by and say bonjour in Canada!

ReflectLive

on

When clinicians communicate with patients via video conferencing, they must not only exchange information but also convey a sense of sympathy, sensitivity, and attentiveness. However, video-mediated communication often is less effective than in-person communication because it is challenging to convey and perceive essential non-verbal behaviors, such as eye contact, vocal tone, and body posture. Moreover, non-verbal behaviors that may be acceptable in in-person business meetings such as looking away at notes may be perceived as being rude or inattentive in a video meeting (patients already feel disengaged when clinicians frequently look at medical records instead of at them during in-person visits).

Prior work shows that in video visits, clinicians tend to speak more, being more dominant in the conversation and less empathetic toward patients, which can lead to poorer patient satisfaction and incomplete information gathering. Further, few clinicians are trained to communicate over a video visit, and many are not always aware of how they present themselves to patients over video.

In our paper, I Should Listen More: Real-time Sensing and Feedback of Non-Verbal Communication in Video Telehealth, we describe the design and evaluation of ReflectLive, a system that senses and provides realtime feedback about clinicians’ communication behaviors during video consultations with patients. Furthermore, our user tests showed that real-time sensing and feedback has the potential to train clinicians to maintain better eye contact with patients and be more aware of their non-verbal behaviors.

ReflectLive

The ReflectLive video meeting system, with the visualization dashboard on the right showing real-time metrics about non-verbal behaviors. Heather (in the thumbnail) is looking to the left. A red bar flashes on the left of her window as she looks to the side to remind her that her gaze is not centered on the other speaker. A counter shows the number of seconds and direction she is looking away.  

This paper is published in the Proceedings of the ACM on Human-Computer Interaction. We will present the work at CSCW 2018 in November.

DocHandles @ DocEng 2017

on

The conversational documents group at FXPAL is helping users interact with document content using the interface that best matches their current context and without worrying about the structure of underlying documents. With our system, users should be able to refer to figures, charts, and sections of their work documents seamlessly in a variety of collaboration interfaces to better communicate with their colleagues.

out

To achieve this goal, we are developing tools for understanding, repurposing, and manipulating document structure. The DocHandles work, which we will present at DocEng 2017, is a first step in this direction. With this tool a user can type, for example, “@fig2” into their multimedia chat tool to see a list of recommended figures extracted from recently shared documents. In this case, suggestions returned correspond to figures labeled “figure 2” in the most recently discussed documents in the chat, along with the document filename or title and caption. Users can then select their desired figure, which is automatically injected into the chat.

Please come see our presentation in Session 7 (User Interactions) at 17:45 on September 5th on to find out more about this system as well as some of our future plans for conversational documents.

Improving User Interfaces for Robot Teleoperation

on

The FXPAL robotics research group has recently explored technologies for improving the usability of mobile telepresence robots. We evaluated a prototype head-tracked stereoscopic (HTS) teleoperation interface for a remote collaboration task. The results of this study indicate that using a HTS systems reduces task errors and improves the perceived collaboration success and
viewing experience.

We also developed a new focus plus context viewing technique for mobile robot teleoperation. This allows us to use wide-angle camera images
that proved rich contextual visual awareness of the robot’s surroundings while at the same time preserving a distortion-free region
in the middle of the camera view.

To this, we added a semi-automatic robot control method that allows operators to navigate the telepresence robot via a pointing and clicking directly on
the camera image feed. This through-the-screen interaction paradigm has the advantage of decoupling operators from the robot control loop, freeing them for
other tasks besides driving the robot.

As a result of this work, we presented two papers at the IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). We obtained a best paper award for the paper “Look Where You’re Going: Visual Interfaces for Robot Teleoperation” in the Design category.

DocuGram at DocEng

on

Teleconferencing is now a nearly ubiquitous aspect of modern work. We routinely use apps such as Google Hangouts or Skype to present work or discuss documents with remote colleagues. Unfortunately, sharing source documents is not always as seamless. For example, a meeting participant might share content via screencast that she has access to, but that the remote participant does not. Remote participants may also not have the right software to open the source document, or the content shared might be only a small section of a large document that is difficult to share.

Later this week in Vienna, we will present our work at DocEng on DocuGram, a tool we developed at FXPAL to help address these issues. DocuGram can capture and analyze shared screen content to automatically reconstitute documents. Furthermore, it can capture and integrate annotations and voice notes made as the content is shared.

The first video below describes DocuGram, and the second shows how we have integrated it into our teleconferencing tool, MixMeet. Check it out, and be sure to catch our talk on Friday, September 16th at 10:00AM.

FXPAL at Mobile HCI 2016

on

Early next week, Ville Mäkelä and Jennifer Marlow will present our work at Mobile HCI on tools we developed at FXPAL to support distributed workers. The paper, “Bringing mobile into meetings: Enhancing distributed meeting participation on smartwatches and mobile phones”, presents the design, development, and evaluation of two applications, MixMeetWear and MeetingMate, that aim to help users in non-standard contexts participate in meetings.

The videos below show the basic functionality of the two systems. If you are in Florence for Mobile HCI, please stop by their presentation on Thursday, September 8, in the 2:00-3:30 session (in Sala Verde) to get the full story.

Ciao!