Blog Author: Scott Carter

Prototyping reality

on

One of our ongoing goals at the lab is to understand how best to take advantage Augmented Reality (AR) to annotate physical objects with digital media. Unfortunately, the objects we tend to focus on (such as mutli-function devices or printers) are often large and relatively immobile, making it difficult for us to visit remote sites to demonstrate our technologies.

To address this problem, we are experimenting with paper-based models of the physical objects we want to augment, which are much more lightweight and mobile while still approximating the embodied experience of a 3D device (see Figure 1). In order to register the paper-based models with AR tracking systems, we can either scan the entire paper-based object or, if the object corresponds to a cube or rectangular box, we can register each side as independent images (images may in fact correspond to registration images used in the actual scene). In either case, this paper-based object is mobile and easily reconfigurable, giving us much more flexibility in how, when, and where we present AR content (Figure 2).

Figure 1. Our printer paper prototype.
Figure 2. Viewing digital content affixed to the paper printer prototype with a mobile AR tool.

This approach represents somewhat of an inversion of typical paper-based prototyping methods in which user-interface elements are prototyped rather than the physical object against which they are registered (which do not exist for most 2D interfaces). Marc Rettig introduced lo-fi prototyping with paper UI elements in his influential paper Prototyping for Tiny Fingers , and this method was adopted rapidly throughout the user experience community. Recently, researchers have extended it to AR scenarios as well.

PapAR was one of the first to adapt paper prototyping techniques to AR for head-mounted displays. It is a straightforward design that involves a base layer with real-world elements drawn in paper similar to a typical paper prototype as well as a transparent overlay onto which are draw AR interactors. This is a simple and elegant “glass pane” approach familiar to user experience professionals.

Figure 3. In PapAR, authors move a transparent AR overlay over a sketched real-world scene.

Michael Nebeling’s work at the University of Michigan School of Information pushes this concept further. Inspired by issues with an earlier AR creation toolkit (the influential DART system), Nebeling et al. first built ProtoAR, which allows AR designers to integrate 2D paper sketches as well as 3D Play-Doh mockups into a prototype AR scene. The toolkit includes a desktop and a mobile app that creators can use to scan physical objects, integrate into an AR scene, and link to real-world markers.

The researchers later extended this toolkit to allow authors to adjust the representation of AR content live, facilitating Wizard-of-Oz style user testing (see their CHI presentation on this work).

Closer to our approach are tools that augment paper prototypes with digital resources to experiment with AR content. For example, the ARcadia system supports authoring AR-based tangible computing interfaces. In this system, content creators attach markers to paper prototypes then use a desktop tool to augment the prototypes with digital content.

We have a long tradition of using and extending lightweight prototyping methods at FXPAL. In light of recent events, we expect to focus future work on extending lightweight AR prototyping tools to support remote experimentation and design iteration.

Augmented Reality: Is this time different?

on

Ivan Sutherland’s Sword of Damocles, a head-mounted virtual and augmented reality system, was ungainly but remarkably forward-thinking. Developed over a half-century ago, the demonstration in the video below includes many of the components that we recognize today as critical to VR and AR displays, including the ability to display graphics via a headset, a positioning system, and an external computational mechanism.

Since then, AR and VR have experienced waves of hype that builds over a few years but reliably fades in disappointment. With the current excitement over consumer-level AR libraries (such as ARKit and ARCore), it is worth asking if anything is different this time.

The Augmented Connected Enterprise (ACE) team at FXPAL is betting that it is. We are currently building an AR-based remote assistance framework that combines several of our augmented reality, knowledge capture, and teleconferencing technologies. A future post will describe the engineering details of our work in more detail. Here we explore some of the problems that AR has faced in the past, and how we plan to address them.

In their paper “Drivers and Bottlenecks in the Adoption of Augmented Reality Applications” [1], Martinez et al. explored some typical pitfalls for AR technology, including No standard and little flexibility, Limited (mobile device) computational power, (Localization) inaccuracy, Social acceptance, and Amount of information (Distraction). We address each of these in turn below:

  • No standard and little flexibility
  • Limited (mobile device) computational power

Advances in contemporary technologies have largely addressed these two issues. As mentioned above, the market appears to be coalescing into two or three widely adopted libraries (specifically ARKit, ARCore, and Unity). Furthermore, limited computational power on mobile devices is a rapidly receding concern.

  • (Localization) inaccuracy

Caudell and Mizell echoed this issue in their paper introducing the term, “augmented reality” [2]. They wrote that, “position sensing technology is the ultimate limitation of AR, controlling the range and accuracy of possible applications.”

Addressing this concern involves scanning several real world objects in order to detect and track them in an AR scene. Our experiences so far reveal that, even if they aren’t yet ready for wide deployment, detection and tracking technologies have come a long way. The video below shows our procedure for scanning a 3D object with ARKit (adapted from this approach). We have found that ensuring a flat background is paramount to generating an object free of noisy background feature points. Other than that, the process is straightforward.

Scanning an object in this way generates a digital signature that our app can recognize quickly and accurately, allowing us to augment the physical object with interactive guides.

  • Social acceptance

The many issues associated with the launch of Google Glass made it clear that HMD devices are not yet acceptable to the consumer market. But our intuition is that focusing on the consumer market is inappropriate, at least initially, and that developers should instead target industrial settings (as Caudell and Mizell did at Boeing). A more appropriate metaphor for AR and VR devices (outside of their use in gaming) is a hard hat—something that you put on when you need to complete a task.

  • Amount of information (Distraction)

Martinez et al. are concerned that the “amount of information to be displayed in the augmented view may exceed the needs of the user.” This strikes us less as a bottleneck and more a design guideline—take care to make AR objects as unobtrusive as possible.

In addition to the issues above, we think there are at least two other problems standing in the way of widespread AR adoption:

  • Authoring

There are a variety of apps that can help AR content creators author scenes manually, including Amazon Sumerian, Apple Reality Composer, Adobe Aero, and ScopeAR WorkLink. However, with these tools designers still must create, import, place, and orient models, as well as organize scenes temporally. We think there are opportunities to simplify this process with automation.

  • Value

Finally, as with any technology, users will not adopt AR unless it provides value in return for their investments in time and money. Luckily, AR technologies, specifically those involving remote assistance, enjoy a clear value proposition: reduced costs and time wasted due to travel. This is why we believe the current wave of interest in AR technologies may be different. Previous advances in the quality of HMDs and tracking technologies were not met with similar increases in teleconfercing technologies and infrastructure. Now, however, robust, full media teleconferencing technologies are commonplace, making remote AR sessions more feasible.

Many tools already take advantage of a combination of AR and teleconferencing technologies. However, to truly stand in for an in-person visit, tele-work tools must facilitate a wide range of guided interaction. Experts feel they must travel to sites because they need to diagnose problems rapidly, change their point-of-view with ease to adapt to each particular situation, and experiment or interact with problems dynamically. This type of fluid action is difficult to achieve remotely when relaying commands through a local agent. In a future post, we will discuss methods we are developing to make this interaction as seamless as possible, as well as approaches for automated authoring. Stay tuned!

[1] T. P. Caudell and D. W. Mizell. “Augmented reality: An application of
heads-up display technology to manual manufacturing processes”. In
Proc. Hawaii Int’l Conf. on Systems Sciences, 659–669, 1992.

[2] Martínez, H. et al. “Drivers and Bottlenecks in the Adoption of Augmented Reality Applications”. Journal of Multimedia Theory and Application, Volume 1, 27-44, 2014.

DocHandles @ DocEng 2017

on

The conversational documents group at FXPAL is helping users interact with document content using the interface that best matches their current context and without worrying about the structure of underlying documents. With our system, users should be able to refer to figures, charts, and sections of their work documents seamlessly in a variety of collaboration interfaces to better communicate with their colleagues.

out

To achieve this goal, we are developing tools for understanding, repurposing, and manipulating document structure. The DocHandles work, which we will present at DocEng 2017, is a first step in this direction. With this tool a user can type, for example, “@fig2” into their multimedia chat tool to see a list of recommended figures extracted from recently shared documents. In this case, suggestions returned correspond to figures labeled “figure 2” in the most recently discussed documents in the chat, along with the document filename or title and caption. Users can then select their desired figure, which is automatically injected into the chat.

Please come see our presentation in Session 7 (User Interactions) at 17:45 on September 5th on to find out more about this system as well as some of our future plans for conversational documents.

DocuGram at DocEng

on

Teleconferencing is now a nearly ubiquitous aspect of modern work. We routinely use apps such as Google Hangouts or Skype to present work or discuss documents with remote colleagues. Unfortunately, sharing source documents is not always as seamless. For example, a meeting participant might share content via screencast that she has access to, but that the remote participant does not. Remote participants may also not have the right software to open the source document, or the content shared might be only a small section of a large document that is difficult to share.

Later this week in Vienna, we will present our work at DocEng on DocuGram, a tool we developed at FXPAL to help address these issues. DocuGram can capture and analyze shared screen content to automatically reconstitute documents. Furthermore, it can capture and integrate annotations and voice notes made as the content is shared.

The first video below describes DocuGram, and the second shows how we have integrated it into our teleconferencing tool, MixMeet. Check it out, and be sure to catch our talk on Friday, September 16th at 10:00AM.

FXPAL at Mobile HCI 2016

on

Early next week, Ville Mäkelä and Jennifer Marlow will present our work at Mobile HCI on tools we developed at FXPAL to support distributed workers. The paper, “Bringing mobile into meetings: Enhancing distributed meeting participation on smartwatches and mobile phones”, presents the design, development, and evaluation of two applications, MixMeetWear and MeetingMate, that aim to help users in non-standard contexts participate in meetings.

The videos below show the basic functionality of the two systems. If you are in Florence for Mobile HCI, please stop by their presentation on Thursday, September 8, in the 2:00-3:30 session (in Sala Verde) to get the full story.

Ciao!

MixMeet: Live searching and browsing

on

Knowledge work is changing fast. Recent trends in increased teleconferencing bandwidth, the ubiquitous integration of “pads and tabs” into workaday life, and new expectations of workplace flexibility have precipitated an explosion of applications designed to help people collaborate from different places, times, and situations.

Over the last several months the MixMeet team observed and interviewed members of many different work teams in small-to-medium sized businesses that rely on remote collaboration technologies. In work we will present at ACM CSCW 2016, we found that despite the widespread adoption of frameworks designed to integrate information from a medley of devices and apps (such as Slack), employees utilize a surprisingly diverse but unintegrated set of tools to collaborate and get work done. People will hold meetings in one app while relying on another to share documents, or share some content live during a meeting while using other tools to put together multimedia documents to share later. In our CSCW paper, we highlight many reasons for this increasing diversification of work practice. But one issue that stands out is that videoconferencing tools tend not to support archiving and retrieving disparate information. Furthermore, tools that do offer archiving do not provide mechanisms for highlighting and finding the most important information.

In work we will present later this fall at ACM MM 2015 and ACM DocEng 2015, we describe new MixMeet features that address some of these concerns so that users can browse and search the contents of live meetings to retrieve rapidly previously shared content. These new features take advantage of MixMeet’s live processing pipeline to determine actions users take inside live document streams. In particular, the system monitors text and cursor motion in order to detect text edits, selections, and mouse gestures. MixMeet applies these extra signals to user searches to improve the quality of retrieved results and allow users to quickly filter a large archive of recorded meeting data to find relevant information.

In our ACM MM paper (and toward the end of the above video) we also describe how MixMeet supports table-top videoconferencing devices, such as Kubi. In current work, we are developing multiple tools to extend our support to other devices and meeting situations. Publications describing these new efforts are in the pipeline: stay tuned.

More evidence of the value of HMD capture

on

At next week’s CSCW 2015 conference, a group from University of Wisconsin-Madison will present an interesting piece of work related to the last post: “Handheld or Handsfree? Remote Collaboration via Lightweight Head-Mounted Displays and Handheld Devices”. Similar to our work, the authors compared the use of Google Glass to a tablet-based interface for two different construction tasks: one simple and one more complex. While in our case study participants created tutorials to be viewed at a later time, this test explored synchronous collaboration.

The authors found that Google Glass was helpful for the more difficult task, enabling better and more frequent communication, while for the simpler task the results were mixed. This more-or-less agrees with our findings: HMDs are helpful for capturing and communicating complicated tasks but less so for table-top tasks.

Another key difference between this work and ours is that the authors relied on Google Hangouts to stream videos. However, as the authors write, “the HMD interface of Google Hangouts used in our study did not offer [live preview feedback],” a key feature for any media capture application.

At FXPAL, we build systems when we are limited by off-the-shelf technology. So when we discovered a related capture feedback issue in early pilots we were able to quickly fix it in our tool. Of course in our case the technology was much simpler because we did not need to implement video streaming. However, since this paper was published we have developed mechanisms to stream video from Glass, or any Android device, using open WebRTC protocols. More than that, our framework can analyze incoming frames and then stream out arbitrary image data, potentially allowing us to implement many of the design implications the authors describe in the paper’s discussion section.

Head-mounted capture and access with ShowHow

on

Our IEEE Pervasive paper on head-mounted capture for multimedia tutorials was recently accepted and is currently in press. We are excited to share some our findings here.

Creating multimedia tutorials requires two distinct steps: capture and editing. While editing, authors have the opportunity to devote their full attention to the task at hand. Capture is different. In the best case, capture should be completely unobtrusive so that the author can focus exclusively on the task being captured. But this can be difficult to achieve with handheld devices, especially if the task requires that the tutorial author move around an object and use both hands simultaneously (e.g., showing how to replace a bike derailleur).

For this reason, we extended our ShowHow multimedia tutorial system to support head-mounted capture. Our first approach was simple: a modified pair of glasses with a Looxcie camera and laser guide attached. While this approach interfered with the user’s vision less than other solutions, such as a full augmented reality system, it nonetheless suffered from an array of problems: it was bulky, it was difficult to control, and without a display feedback of the captured area it was hard to frame videos and photos.

Picture2Our first head-mounted capture prototype

Luckily, Google Glass launched around this time. With an onboard camera, a touch panel, and display, it seemed an excellent choice for head-mounted capture.

Our video application to the Glass Explorers program

To test this, we built an app for Google Glass that requires minimal attention to the capture device and instead allows the author to focus on creating the tutorial content. In our paper, we describe a study comparing standalone capture (camera on tripod) versus head-mounted (Google Glass) capture. Details are in the paper, but in short we found that tutorial authors prefer wearable capture devices, especially when recording activities involving larger objects in non-tabletop environments.

The ShowHow Google Glass capture app

Finally, based on the success of Glass for capture we built and tested an access app as well. A detailed description of the tool, as well as another study we ran testing its efficacy for viewing multimedia tutorials, is the subject of an upcoming paper. Stay tuned.

The ShowHow Google Glass access app

MixMeet

on

At FXPAL, we build and evaluate systems that make multimedia content easier to capture, access, and manipulate. In the Interactive Media group we are currently focusing on remote work and distributed meetings in particular. On one hand, meetings can be inefficient at best and a flat-out boring, waste-of-time at worst. However, there are some key benefits to meetings, especially those that are more ad hoc and driven by specific, concrete goals. More and more meetings are held with remote workers via multimedia-rich interfaces (such as HipChat and Slack).  These systems augment web-based communication with lightweight content sharing to reduce communication overhead while helping teams focus on immediate tasks.

We are developing a tool, MixMeet, to make lightweight, multimedia meetings more dynamic, flexible, and hopefully more effective. MixMeet is a web-based collaboration tool designed to support content interaction and extraction for use in both live, synchronous meetings as well as asynchronous group work. MixMeet is a pure web system that uses the WebRTC framework to create video connections. It supports live keyframe archiving and navigation, content-based markup, and the ability to copy-and-paste content to personal or shared notes. Each meeting participant can flexibly interact with all other clients’ shared screen or webcam content.  A backend server can be configured to archive keyframes as well as record each user’s stream.

Our vision for MixMeet is to make it easy to mark up and reuse content from meetings, and make collaboration over visual content a natural part of web-based conferencing. As you can see from the video below, we have made some progress toward this goal. However, we know there are many issues with remote, multimedia-rich work that we don’t yet fully understand. To that end, we are currently conducting a study of remote videoconferencing tools. If your group uses any remote collaboration tools with distributed groups please fill out our survey.

on automation and tacit knowledge

on

We hear a lot about how computers are replacing even white collar jobs. Unfortunately, often left behind when automating these kinds of processes is tacit knowledge that, while perhaps not strictly necessary to generate a solution, can nonetheless improve results. In particular, many professionals rely upon years of experience to guide designs in ways that are largely invisible to non-experts.

One of these areas of automation is document layout or reflow in which a system attempts to fit text and image content into a given format. Usually such systems operate using templates and adjustable constraints to fit content into new formats. For example, the automated system might adjust font size, table and image sizes, gutter size, kerning, tracking, leading, etc. in different ways to match a loosely defined output style. These approaches can certainly be useful, especially for targeting output to devices with arbitrary screen sizes and resolutions. One of the largest problems, however, is that these algorithms often ignore what might have been a considerable effort by the writers, editors, and backshop designers to create a visual layout that effectively conveys the material. Often designers want detailed control over many of the structural elements that such algorithms adjust.

For this reason I was impressed with Hailpern et al.’s work at DocEng 2014 on document truncation and pagination for news articles. In these works, the authors’ systems analyze the text of an article to determine pagination and truncation breakpoints in news articles that correspond to natural boundaries in articles between high-level, summary content and more detailed content. This derives from an observation that journalists tend to write articles in “inverted pyramid” style in which the most newsworthy, summary information appears near the beginning with details toward the middle and background info toward the end. This is a critical observation in no small part because it means that popular newswriting bears little resemblance to academic writing. (Perhaps what sets this work apart from others is that the authors employed a basic tenet of human-computer interaction: the experiences of the system developer are a poor proxy for the experiences of other stakeholders.)

Foundry, which Retelny et al. presented at UIST 2014, takes an altogether different approach. This system, rather than automating tasks, helps bring diverse experts together in a modular, flexible way. The system helps the user coordinate the recruitment of domain experts into a staged workflow toward the creation of a complex product, such as an app or training video. The tool also allows rapid reconfiguration. One can imagine that this system could be extended to take advantage of not only domain experts but also people with different levels of expertise — some “stages” could even be automated. This approach is somewhat similar to the basic ideas in NudgeCam, in which the system incorporated general video guidelines from video-production experts, templates designed by experts in the particular domain of interest, novice users, and automated post hoc techniques to improve the quality of recorded video.

The goal of most software is to improve a product’s quality as well as efficiency with which it is produced. We should keep in mind that this is often best accomplished not by systems designed to replace humans but rather those developed to best leverage people’s tacit knowledge.