DocEng 2015



FXPAL had two publications at DocEng 2015. The conference was in Lausanne, Switzerland.

“High-Quality Capture of Documents on a Cluttered Tabletop with a 4K Video Camera”

“Searching Live Meetings: “Show me the Action”

Some observations from FXPAL colleagues

Jean Paoli, co-author of XML, opened the DocEng 2015 conference by taking us back to the early days of SGML all the way to JSON and Web Components, remembering along the way OLE. Jean believes in a future where documents and data are one, where documents are comprised of chunks of content manually authored along with automatically produced components such as graphics, tables, etc. He questioned the kinds of user interfaces required to produce these documents, how to consume them and reuse in turn their parts.

In “The Browser as a Document Composition Engine”, Tamir and his colleagues from HP Labs explained how printing web pages was still a bad experience for most users today. They developed a method to generate a beautifully formatted PDF version of web pages; the tool selects article content, fits them into appropriate templates and uses only the browser to measure how each character fits on the page. The output is PDF, which is ubiquitous to finally print the rendered web page, but previewing the result inside the web browser before printing is also possible. Decluttering web pages is still a manual or semi-automatic process where users tag page elements before printing, but they promised an upcoming paper on that subject. Stay tuned.

Tokyo university also had an interesting take on improving document layout; instead of playing with character spacing to avoid orphans and word splits at the end of lines, they chose a Natural Language Process (NLP) approach where terms are replaced with synonyms (paraphrased) until the layout becomes free of layout errors. Nice way to tie NLP with document layout.

Frédéric Kaplan from EPFL presented the Venice Time Machine, where they pick up words from old scanned manuscripts in order to recreate the FaceBook of that time, allowing them to know who lived where and next to whom. A 3D simulation allowed them to then visualize the evolution of the city of Venice (when buildings were built or burned) as well as who lived close by.

Roberto Manduchi from UC Santa Cruz presented an interesting Mobile OCR system to help blind people take OCRable pictures of documents. Surprisingly, they found no statistical difference between a system that guides the user (move left 4 cm, up 5) and a system where the user moves the phone freely above the sheet of paper until the system detects a good pause. They hypothesize that more space is explored when the user is free to move the phone as opposed to when they wait for instructions on where to move it. They also found that users can actually learn to take better photos of documents after using their system, meaning they could probably take photos without any help in the future.

Dan Leijen from Microsoft presented Madako, a new document editor based on the simple Markdown syntax, but extended to allow the production of scientific papers, by exporting to LateX and PDF. The system integrates with DropBox and allows co-authors to collaborate asynchronously. Really handy tool.

Alexandra Branzan Albu from University of Victoria described to me her system for detecting changes of graphs and tables when a new version of the graphics package produced by SalesForce is released. It turns out that graph layouts can contain mistakes that are impossible to detect besides comparing the actual rendered images. Alexandra’s method compares these layouts automatically and detects what errors were introduced.

Helen Balinsky from HP Labs shared with me her vision for securely storing health data from end users, putting them in charge for releasing their own data to insurance, pharmacy, doctors and hospitals. An interesting application of this vision is sending out reminders to your phone when a new prescription needs to be taken, without having to disclose all your health record to the pharmacy for example. The initiative is called Blue Button in the USA. To be continued…

Carlos Alexandre Barros de Mello from Universidade Federal de Pernambuco showed me his very clever way of segmenting overlapping handwritten digits. He and his colleagues are using the binarized image and fit balls that move along the paths. With their velocity, they naturally follow the correct stroke of each character instead of deviating to their neighbor, making the segmentation very robust. Neat!

In our presentations, we presented our system for capturing high-resolution images of paper documents disposed on a desk, and using a ceiling mounted high-resolution camera and how super-resolution to OCR these page images with great accuracy.

We also showed our WebRTC-based video conferencing system MixMeet and more specifically a new technique for detecting mouse and text actions inside video streams; these “actions” are used to boost underlying keywords, helping users quickly retrieve important parts of previous meetings.

Few people seemed aware of WebRTC and its powerful capabilities that enable real-time processing of video in web browsers. A chat with Jean Paoli tipped us at the upcoming WebRTC 2.0 standard that will enable even greater interoperability between vendors such as Mozilla, Google and Microsoft. Interesting debate here for those interested.