While many of the systems we build at FXPAL are either deployed internally or transferred to our parent company, in some cases we get to deploy them in the real world. This week, we released TalkMiner, a system for indexing and searching video of lecture broadcasts. We’ve indexed broadcasts from a variety of sources, including the U.C. Berkeley webcast.berkeley site, the site, and various channels on YouTube, including Google Tech Talks, Stanford University, MIT Open Courseware, O’Reilly Media, TED Talks, and NPTEL Indian Institute of Technology.

But all of these videos are already indexed by web search engines, you say; why do we need TalkMiner?

While web search engines index the text of the page in which the video is embedded, TalkMiner indexes the contents of the slides in the video, making more fine-grained retrieval of video possible. Is this useful?

Well, it turns out the deployment of The Berkeley webcasting system (developed by our president Larry Rowe while he was a professor there) showed that

… students almost always watched the lectures on-demand rather than in real-time, and they rarely watched the entire lecture.  Students use the webcasts to study for exams – we could see this clearly by patterns of usage – and, they primarily wanted to review selected material covered by the instructor.  In one class we discovered that for over 50% of the lectures, students watched less than 10 minutes from a 50-minute lecture and students watched the entire lecture only 10% of the time.  Consequently, for using the system, effective search is a big issue.

To solve this problem, TalkMiner recognizes images of presentations in lecture video, and applies OCR to these regions to extract the slide text. This text is indexed along with the associated time codes, and can then be used to search for specific content. The video is divided into segments corresponding to slides; thumbnails of slides are shown when a video is selected. The video can then be watched end-to-end, or you can skip to a particular slide and listen from there. To help find topics of interest, slides that contain keyword matches to the query are highlighted.

The current index contains over 12,200 talks on a range of topics, and additional talks are indexed daily. Take a look at the system and let us know what you think!

Share on: 


  1. Very cool. I was wondering if it would be possible to give the OCR of the slide in addition to thumbnails? This would be very helpful in collecting notes from presentations……

  2. Twitter Comment

    Posted “TalkMiner” [link to post] #video #search

    Posted using Chat Catcher

  3. Saqib, I think that would only be possible for videos with particular creative commons permissions. It might be interesting to explore that at some point.

  4. john says:

    Also, how to best present the OCR, even if copyrights allow, can be a little tricky. The quality of the recovered text can be quite variable, and while this may not be too noticeable in a retrieval situation, it can quickly become distracting when viewed as a transcription.

  5. […] TalkMiner This is a very neat site that allows you to "search" within video lectures. It does OCR […]

  6. […] addthis_config = {"data_track_clickback":true,"ui_language":"en"};Since its debut a few months ago, TalkMiner has been busily crawling the web and indexing all sorts of talks and lectures. In the mean time we […]

Comments are closed.