Twitter’s tweet code

February 8th, 2010 by Gene Golovchinsky

Twitter recently released some of its tweet-related code as open source. This is great news for those building applications on top of twitter, as it reduces the need to write the same code over and over. The released code  includes parser and HTML markup generator classes, and a Regex class that includes a bunch of Pattern instances. Code is available in Java and Ruby.

The examples seem straightforward to use, which means I will be using them!

Read the rest of this entry »

Bookmark and Share

SSM2010

February 5th, 2010 by Gene Golovchinsky

Last Wednesday Jeremy and I participated in the SSM2010 workshop organized by Ian Soboroff (NIST), Eugene Agichtein (Emory University), Daniel Tunkelang (Google), and Marti Hearst (University of California, Berkeley).  It was a full day of panels, discussions and poster presentations on a variety of topics related to search, to social media, and how to combine the two. In an earlier post, I wrote about one way that we can characterize the space, and Daniel did an excellent job of summarizing the workshop, which was also cross-posted  at BLOG@CACM.

I am still trying to digest all that I learned during the day, but I can say that one of the challenges was live-tweeting the event. I was one of several people who tweeted about what was happening in the panels and about the issues that were raised. Over 500 tweets were sent and resent with the workshop’s hashtag by people at the event and elsewhere. It was interesting to see other people pick up some of the topics and comment on them. In particular, several of my twitter friends who are not part of the SSM research community had commented on the tweets, and retweeted certain aspects of the discussion.

Read the rest of this entry »

Bookmark and Share

Recent Progress in Quantum Algorithms

February 4th, 2010 by Eleanor Rieffel

Dave Bacon, who wrote the elegant overview of the research discussed in my New Year’s Day post, just published his review, joint with Wim van Dam, of Recent Progress in Quantum Algorithms. Bacon writes beautifully, and this piece is no exception.

Most people have heard of no more than two quantum algorithms: Shor’s factoring algorithm and Grover’s search algorithm. For five years after Grover’s algorithm, no one discovered a significantly novel quantum algorithm, only variations on Shor’s and Grover’s algorithms were found. The first truly new quantum algorithms were discovered starting in 2001. Now there are many quantum algorithms found using a variety of approaches, though the applications remain restricted.  My recent overview of quantum computing mentions many of these algorithms. Bacon and van Dam provide a more detailed, but still high level, view of these algorithms. They group the algorithms into four categories corresponding to different approaches: quantum random walks, wave packet scattering, finding hidden symmetries, and simulating quantum physics. I hope many of you will enjoy learning more about, in their words, “the benefits of … studying the notion of an algorithm through the perspective of the physical laws of the universe.”

Bookmark and Share

How to compute without knowing anything

February 3rd, 2010 by Eleanor Rieffel

In my post on quantum inspired classical results, I gave as one example Gentry’s recent discovery of a fully homomorphic encryption scheme. His beautiful work deserves its own blog post. Initially I approached his work with trepidation, worried that it would be so technical I would not understand anything without a lot of work. Others have mentioned not  having looked at his work for the same reason. That is a shame! While the details are technical, the key idea, bootstrappable encryption, is both a non-obvious approach and an easily understandable concept.  I remember smiling while I read the first couple of pages of his paper in response to the elegance and surprising simplicity of his approach.

Read the rest of this entry »

Bookmark and Share

Talking with Twitter

February 2nd, 2010 by Gene Golovchinsky

I’ve been messing with the Twitter search API, and I am here to whine about it. Overall, it’s a great feature, but it’s interesting that it imposes costs on the third-party client that the Twitter interface seemingly doesn’t share. For example, I can run a search and get back a bunch of results. When I do it from the Twitter web page, it gives me the option of drilling down and showing conversations when they come up in search results.

When I execute the same query using the API, however, there is no indication that a particular message was related to some other message in any way. Sure, I know who sent what to whom, but that’s not enough! Not only does the search API not tell me when a message is a reply, it doesn’t provide useful information to indicate a retweet, either.

Read the rest of this entry »

Bookmark and Share

Exploring workplace communication

February 1st, 2010 by Thea Turner

Modern work is a collaborative enterprise. As such, it depends on communication among the collaborators to reach successful outcomes. An increasing number of communication tools are based on somewhat recent computer technologies, such as email, blogs, wikis, social networking, and Twitter.While there have been many studies of single communication tools in the workplace (IM, wikis, blogging, etc.) we believe that we are one of the first to take a broad view of the communication landscape since the introduction of these new technologies.

In our paper, to be presented at CHI 2010, we explored the communication ecology of a small business. We examined the work communication practices of our participants, including what methods people used to communicate and why, how they viewed the various methods and how they adopted them.

Read the rest of this entry »

Bookmark and Share

What do we mean by “Search in Social Media”?

January 29th, 2010 by Gene Golovchinsky

Jeremy and I have been busy preparing for the Search in Social Media (SSM2010) workshop. We thought we would start at the beginning and ask what people understood by the term “search in social media.” Workshops often spend a bunch of time on definitions, and we thought we’d jump in early. We’ve talked about social search before, but that was without reference to social media.

We think the phrase ’search in social media’ has been used to refer to both the information being searched, and to the process for doing so. The information is standard user-generated content — tweets, blog posts, comment threads, tags, etc. The process, however, seems less well understood.

Read the rest of this entry »

Bookmark and Share

Finding facets

January 28th, 2010 by Gene Golovchinsky

I’ve been messing around with Twitter search, which (on a small scale) led me to store structured tweet, people and document data. I used a relational database to store the data I got from Twitter, and everything worked just fine. (That is, performance was limited by the Twitter API and Twitter search API, not by my database.) But say you have lots of data, and it includes text and structure, and you want to search it. What if you’re Twitter or LinkedIn? Can you still use MySQL or Oracle or whatever to store your data and serve up search results?

At a recent SDForum talk on the search capabilities of LinkedIn, John Wang described how LinkedIn handles its faceted search. The talk covered a wide range of topics around managing scalability that are undoubtedly shared by many web companies: how to handle real-time updates, how to scale to millions of users, etc. LinkedIn uses Lucene and other related tools, and to their credit has made contributions to the Lucene open source tool set, including Bobo and Zoie.

Read the rest of this entry »

Bookmark and Share

Does IP matter?

January 27th, 2010 by Gene Golovchinsky

Panos Ipeirotis recently wrote about the confusing state of affairs with respect to intellectual property at his University. In some sense, this is ironic, since the whole point of a University is to produce intellectual property. But I suppose the question isn’t really one of production, but rather of distribution and of consumption. It’s clear that the faculty and students who develop the ideas should own (i.e., receive credit for) those ideas. But once an idea is published, how it gets used is a different story.

With others (e.g., Christopher Browne) I have often wondered why a public university (or a private one that receives significant federal funding for research) has any rights to patent the results of its research. After all, government employees are not allowed to patent the results of their work done for the government; why should government-funded work at universities be different?

Furthermore, does it matter to a University to hold patents, particularly software patents?

Read the rest of this entry »

Bookmark and Share

What If Everyone Were Number One?

January 26th, 2010 by Jeremy Pickens

I’ve been doing a bit of thinking lately about search engines, algorithmic openness, and spammers.  I suppose this was all prompted by a blog post recently on the Meaning of Open: http://googleblog.blogspot.com/2009/12/meaning-of-open.html

In this post, it is claimed that openness is good: open systems, open source, open data.  This claim is held forth as true…for everything except for search algorithms.   In the case of algorithms, the secret sauce must be kept exactly that: secret.  Spammers would otherwise have too much power.

That claim makes me want to play around with a little thought experiment.  What if the search algorithm were indeed fully open?  What if everyone in the world knew exactly how rankings were done, and could modify their web pages so as to adapt themselves to whatever the ranking function is.  In short, what if everyone were number one?  Read the rest of this entry »

Bookmark and Share