Recently Google announced a new initiative: The Data Liberation Front:
The Data Liberation Front is an engineering team at Google whose singular goal is to make it easier for users to move their data in and out of Google products. We do this because we believe that you should be able to export any data that you create in (or import into) a product. We help and consult other engineering teams within Google on how to “liberate” their products. This is our mission statement: Users should be able to control the data they store any of Google’s products. Our team’s goal is to make it easier for them to move data in and out.
This is a fantastically worthy goal, and I whole-heartedly applaud it. However, I am beginning to wonder: What data is yours to own, in the first place?
For example, consider web searching. Google’s Data Liberation Front lets you extract your Web History, which consists of (1) all query strings that you ran and (2) the identities of any pages (URLs) that you clicked, as a result of those queries. But does Google let you extract the URLs of pages that you didn’t click? Those pages are ones that you still interacted with, not by clicking, but by not clicking. You read the snippet, made a relevance judgment, and decided not to visit. Might not you want to know, in the future, which pages you (implicitly) decided were non-relevant? Isn’t that decision also part of your search data? So shouldn’t Google also let you extract that information as well, in case you want to use that information in the future, for example to compare the results of the same query at different points in time?
Or is there a question of ownership of the set of results to your query? Does Google feel that it owns the result set as a whole, even though you also had a part in constructing that set via your query?
Certainly no one would argue that Google owns the algorithms that produced the set. But does Google own the set itself? There is potentially a lot of value in that set, being able to extract it yourself and reuse and remix it in the future in any way that you see fit. So it makes sense that Google might want to control its distribution. But if you are also an owner or co-owner of that set, Google shouldn’t attempt that control. So the big question is: Are you a (co-)owner?
Here is an analogy by way of Adobe Photoshop. Suppose you open one of your images in the online (webapp) version of Photoshop, apply the Gaussian Blur (soft focus) filter to the image, and then save that result out again. It’s clear that you own the input (it’s your photo), that Adobe owns the Gaussian Blur algorithm (or at least the implementation of it), and that you own the resulting image. Adobe doesn’t lay ownership claim to the output of the algorithm, even though it was their algorithm that produced the output.
So how is this different from a web search? You own the input (the query string that you type). Google owns the algorithm that transforms that input into a list of results. So wouldn’t you also then own the output of that transformation? Not the algorithm, but the output of the algorithm, i.e. the results set. Just like you own the output in Photoshop.
It will be interesting to see whether or not Google will be open enough to allow you to extract this particular form of your data. Currently, they do not.