Several of us just returned from ACM UIST 2014 where we presented some new work as part of the cemint project. One vision of the cemint project is to build applications for multimedia content manipulation and reuse that are as powerful as their analogues for text content. We are working towards this goal by exploiting two key tools. First, we want to use real-time content analysis to expose useful structure within multimedia content. Given some decomposition of the content, which can be spatial, temporal, or even semantic, we then allow users to interact with these sub-units or segments via direct manipulation. Last year, we began exploring these ideas in our work on content-based video copy and paste.
As another embodiment of these ideas, we demonstrated video text retouch at UIST last week. Our browser-based system performs real-time text detection on streamed video frames to locate both words and lines. When a user clicks on a frame, a live cursor appears next to the nearest word. At this point, users can alter text directly using the keyboard. When they do so, a video overlay is created to capture and display their edits.
Because we perform per-frame text detection, as the position of edited text shifts vertically or horizontally in the course of the original (unedited source) video, we can track the corresponding line’s location and update the overlaid content appropriately.
By leveraging our familiarity with manipulating text, this work exemplifies the larger goal to bring interaction metaphors rooted in content creation to enhance both the consumption and reuse of live multimedia streams. We believe that integrating real-time content analysis and interaction design can help us create improved tools for multimedia content usage.