Blog

ReflectLive

on Comments (0)

When clinicians communicate with patients via video conferencing, they must not only exchange information but also convey a sense of sympathy, sensitivity, and attentiveness. However, video-mediated communication often is less effective than in-person communication because it is challenging to convey and perceive essential non-verbal behaviors, such as eye contact, vocal tone, and body posture. Moreover, non-verbal behaviors that may be acceptable in in-person business meetings such as looking away at notes may be perceived as being rude or inattentive in a video meeting (patients already feel disengaged when clinicians frequently look at medical records instead of at them during in-person visits).

Prior work shows that in video visits, clinicians tend to speak more, being more dominant in the conversation and less empathetic toward patients, which can lead to poorer patient satisfaction and incomplete information gathering. Further, few clinicians are trained to communicate over a video visit, and many are not always aware of how they present themselves to patients over video.

In our paper, I Should Listen More: Real-time Sensing and Feedback of Non-Verbal Communication in Video Telehealth, we describe the design and evaluation of ReflectLive, a system that senses and provides realtime feedback about clinicians’ communication behaviors during video consultations with patients. Furthermore, our user tests showed that real-time sensing and feedback has the potential to train clinicians to maintain better eye contact with patients and be more aware of their non-verbal behaviors.

ReflectLive

The ReflectLive video meeting system, with the visualization dashboard on the right showing real-time metrics about non-verbal behaviors. Heather (in the thumbnail) is looking to the left. A red bar flashes on the left of her window as she looks to the side to remind her that her gaze is not centered on the other speaker. A counter shows the number of seconds and direction she is looking away.  

This paper is published in the Proceedings of the ACM on Human-Computer Interaction. We will present the work at CSCW 2018 in November.

DocHandles @ DocEng 2017

on

The conversational documents group at FXPAL is helping users interact with document content using the interface that best matches their current context and without worrying about the structure of underlying documents. With our system, users should be able to refer to figures, charts, and sections of their work documents seamlessly in a variety of collaboration interfaces to better communicate with their colleagues.

out

To achieve this goal, we are developing tools for understanding, repurposing, and manipulating document structure. The DocHandles work, which we will present at DocEng 2017, is a first step in this direction. With this tool a user can type, for example, “@fig2” into their multimedia chat tool to see a list of recommended figures extracted from recently shared documents. In this case, suggestions returned correspond to figures labeled “figure 2” in the most recently discussed documents in the chat, along with the document filename or title and caption. Users can then select their desired figure, which is automatically injected into the chat.

Please come see our presentation in Session 7 (User Interactions) at 17:45 on September 5th on to find out more about this system as well as some of our future plans for conversational documents.

Improving User Interfaces for Robot Teleoperation

on

The FXPAL robotics research group has recently explored technologies for improving the usability of mobile telepresence robots. We evaluated a prototype head-tracked stereoscopic (HTS) teleoperation interface for a remote collaboration task. The results of this study indicate that using a HTS systems reduces task errors and improves the perceived collaboration success and
viewing experience.

We also developed a new focus plus context viewing technique for mobile robot teleoperation. This allows us to use wide-angle camera images
that proved rich contextual visual awareness of the robot’s surroundings while at the same time preserving a distortion-free region
in the middle of the camera view.

To this, we added a semi-automatic robot control method that allows operators to navigate the telepresence robot via a pointing and clicking directly on
the camera image feed. This through-the-screen interaction paradigm has the advantage of decoupling operators from the robot control loop, freeing them for
other tasks besides driving the robot.

As a result of this work, we presented two papers at the IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). We obtained a best paper award for the paper “Look Where You’re Going: Visual Interfaces for Robot Teleoperation” in the Design category.

DocuGram at DocEng

on

Teleconferencing is now a nearly ubiquitous aspect of modern work. We routinely use apps such as Google Hangouts or Skype to present work or discuss documents with remote colleagues. Unfortunately, sharing source documents is not always as seamless. For example, a meeting participant might share content via screencast that she has access to, but that the remote participant does not. Remote participants may also not have the right software to open the source document, or the content shared might be only a small section of a large document that is difficult to share.

Later this week in Vienna, we will present our work at DocEng on DocuGram, a tool we developed at FXPAL to help address these issues. DocuGram can capture and analyze shared screen content to automatically reconstitute documents. Furthermore, it can capture and integrate annotations and voice notes made as the content is shared.

The first video below describes DocuGram, and the second shows how we have integrated it into our teleconferencing tool, MixMeet. Check it out, and be sure to catch our talk on Friday, September 16th at 10:00AM.

FXPAL at Mobile HCI 2016

on

Early next week, Ville Mäkelä and Jennifer Marlow will present our work at Mobile HCI on tools we developed at FXPAL to support distributed workers. The paper, “Bringing mobile into meetings: Enhancing distributed meeting participation on smartwatches and mobile phones”, presents the design, development, and evaluation of two applications, MixMeetWear and MeetingMate, that aim to help users in non-standard contexts participate in meetings.

The videos below show the basic functionality of the two systems. If you are in Florence for Mobile HCI, please stop by their presentation on Thursday, September 8, in the 2:00-3:30 session (in Sala Verde) to get the full story.

Ciao!

DocEng 2015

on

DSC00238_01_f_640

FXPAL had two publications at DocEng 2015. The conference was in Lausanne, Switzerland.

“High-Quality Capture of Documents on a Cluttered Tabletop with a 4K Video Camera”

“Searching Live Meetings: “Show me the Action”


Some observations from FXPAL colleagues

Jean Paoli, co-author of XML, opened the DocEng 2015 conference by taking us back to the early days of SGML all the way to JSON and Web Components, remembering along the way OLE. Jean believes in a future where documents and data are one, where documents are comprised of chunks of content manually authored along with automatically produced components such as graphics, tables, etc. He questioned the kinds of user interfaces required to produce these documents, how to consume them and reuse in turn their parts.

In “The Browser as a Document Composition Engine”, Tamir and his colleagues from HP Labs explained how printing web pages was still a bad experience for most users today. They developed a method to generate a beautifully formatted PDF version of web pages; the tool selects article content, fits them into appropriate templates and uses only the browser to measure how each character fits on the page. The output is PDF, which is ubiquitous to finally print the rendered web page, but previewing the result inside the web browser before printing is also possible. Decluttering web pages is still a manual or semi-automatic process where users tag page elements before printing, but they promised an upcoming paper on that subject. Stay tuned.

Tokyo university also had an interesting take on improving document layout; instead of playing with character spacing to avoid orphans and word splits at the end of lines, they chose a Natural Language Process (NLP) approach where terms are replaced with synonyms (paraphrased) until the layout becomes free of layout errors. Nice way to tie NLP with document layout.

Continue Reading

MixMeet: Live searching and browsing

on

Knowledge work is changing fast. Recent trends in increased teleconferencing bandwidth, the ubiquitous integration of “pads and tabs” into workaday life, and new expectations of workplace flexibility have precipitated an explosion of applications designed to help people collaborate from different places, times, and situations.

Over the last several months the MixMeet team observed and interviewed members of many different work teams in small-to-medium sized businesses that rely on remote collaboration technologies. In work we will present at ACM CSCW 2016, we found that despite the widespread adoption of frameworks designed to integrate information from a medley of devices and apps (such as Slack), employees utilize a surprisingly diverse but unintegrated set of tools to collaborate and get work done. People will hold meetings in one app while relying on another to share documents, or share some content live during a meeting while using other tools to put together multimedia documents to share later. In our CSCW paper, we highlight many reasons for this increasing diversification of work practice. But one issue that stands out is that videoconferencing tools tend not to support archiving and retrieving disparate information. Furthermore, tools that do offer archiving do not provide mechanisms for highlighting and finding the most important information.

In work we will present later this fall at ACM MM 2015 and ACM DocEng 2015, we describe new MixMeet features that address some of these concerns so that users can browse and search the contents of live meetings to retrieve rapidly previously shared content. These new features take advantage of MixMeet’s live processing pipeline to determine actions users take inside live document streams. In particular, the system monitors text and cursor motion in order to detect text edits, selections, and mouse gestures. MixMeet applies these extra signals to user searches to improve the quality of retrieved results and allow users to quickly filter a large archive of recorded meeting data to find relevant information.

In our ACM MM paper (and toward the end of the above video) we also describe how MixMeet supports table-top videoconferencing devices, such as Kubi. In current work, we are developing multiple tools to extend our support to other devices and meeting situations. Publications describing these new efforts are in the pipeline: stay tuned.

HMD and specialization

on

Google Glass’ semi-demise has become a topic of considerable interest lately. Alexander Sommer at WT-Vox takes the view that it was a courageous “public beta” and “a PR nightmare” but also well received in specialized situations where the application suits the device, as in Scott’s post below. IMO, a pretty good summary.

(I notice that Sony has jumped in with SmartEyeglass – which have been called “too dorky to be believed…” Still, one person’s “dorky” is another person’s “specialized.”)

Visually Interpreting Names as Demographic Attributes

on

In the AAAI 2015 conference, we presented the work “Visually Interpreting Names as Demographic Attributes by Exploiting Click-Through Data,” a collaboration with a research team in National Taiwan University. This study aims to automatically associate a name and its likely demographic attributes, e.g., gender and ethnicity. More specifically, the associations are driven by web-scale search logs that are collected via a search engine when internet users retrieve images.

Demographic attributes are vital to semantically characterize a person or a community. This makes it valuable for marketing, personalization, face retrieval, social computing and more human-centric research. Since users tend to keep their online profiles private, name is the most reachable piece of personal information among these contexts. The problem we address is – given a name, associating and predicting its likely demographic attributes. For example, given a person named “Amy Liu,” the person is likely an Asian female. Name makes the first impression of a person because naming conventions are strongly influenced by culture, e.g., first name and gender, last name and location of origin. Typically, the associations between names and the two attributes are made by referring to demographics maintained by governments or by manually labeling attributes based on the given personal information (e.g., photo). The former is limited in regional census data. The latter has major concerns in time and cost when it adapts to large-scale data.

Different from prior approaches, we propose to exploit click-throughs between text queries and retrieved face images in web search logs, where the names are extracted from queries and the attributes are detected from face images automatically. In this paper, a click-through means when one of the URLs returned by a text query has been clicked by a user to view a web image it directs to. The mechanism delivers two messages, (1) the association between a query and an image is based on viewers’ clicks, that is, human intelligence from web-scale users; (2) users may have considerable knowledge to the associations because they might be partially aware of what they are looking for and search engines are getting much better at satisfying user intent. Both characteristics of click-throughs reduce concerns of incorrect associations. Moreover, the Internet users’ knowledge enables discovering name-attribute associations with high generality to more countries.

In the experiments, the proposed name-attribute associations are demonstrated with competitive accuracy compared to using manual labeling. It also benefits profiling social media users and keyword-based face image retrieval, especially the adaption to unseen names. This is the first work to interpret a name to demographic attributes in visual-data-driven manner using web search logs. In the future, we are going to extend the visual interpretation of an abstract name to more targets for which naming conventions are highly influenced by visual appearance.

Using Stereo Vision to Operate Mobile Telepresence Robots

on

The use of mobile telepresence robots (MTRs) is increasing. Very few MTRs have autonomous navigation systems. Thus teleoperation is usually still a manual task, and often has user experience problems. We believe that this may be due to (1) the fixed viewpoint and limited field of view of a 2D camera system and (2) the capability of judging distances due to lack of depth perception.

To improve the experience of teleoperating the robot, we evaluated the use of stereo video coupled with a head-tracked and head-mounted display.

To do this, we installed a brushless gimbal with a stereo camera pair on a robot platform. We used an Oculus Rift (DK1) device for visualization and head tracking.

StereoBot and Gimbal.

Stereobot telepresence robot (left) and stereo gimbal system (right).

We conducted a preliminary user study to gather qualitative feedback about telepresence navigation tasks using stereo vs. a 2D camera feed, and high vs. low camera placement. In a simulated telepresence scenario, participants were asked to drive the robot from an office to a meeting location, have conversation with a tester, then drive back to the starting location.

An ANOVA on System Usability Scale (SUS) scores with visualization type and camera placement as factors results in a significant effect of visualization type on the score. However, we observed a higher SUS score for navigation based on a 2D camera feed. The camera placement height did not show a significant effect.

The following two main reasons could have caused the lower ratings for stereo: (1) about half of the users experienced at least some form of disorientation. This might have been due to their unfamiliarity with immersive VR headsets but also due the sensory distortion effect of being immersed visually in a moving environment while other bodily senses report sitting still. (2) the video transmission quality was not optimal due to interference of the analog video transmission signal by objects in the building and due to the relatively low display resolution of the Oculus Rift DK1 device.

In the future we intend to work on improving the visual quality of the stereo output by using better video transmission and head-worn display. We furthermore intend to evaluate robot navigation tasks using a full VR view. This view will make use of the robot’s sensors and localization system in order to display the robot correctly within a virtual representation of our building.