Data structures are for programmers


I just read an interesting post by David Karger about PIM, end-user programming, data publishing, and lots of other interesting HCI ideas. The premise is that purpose-built applications for PIM impose strict schemas on their users, making it difficult to adapt, repurpose, or integrate the data with other applications. The alternative is something like Evernote, that lumps everything into one bucket, access to which is mediated largely by search. The tradeoff, then, is between a relatively undifferentiated interface backed by search on one hand, and a large number of siloed applications with dedicated interfaces.

David describes several systems (interfaces) his students built that leverage the Haystack framework for storing arbitrary data, and suggests that it’s possible to structure these data management tasks as authoring problems rather than as programming, thereby making flexible, extensible, customized interfaces more widely accessible.

His vision, I believe, is still predicated on fairly sophisticated users because the authoring process that involves arbitrarily complex data may require comparable complexity in authoring decisions, even if procedural programming skills are not required. Managing this complexity well in the user interface, I believe, will still require traditional UI and interaction design skills, which most people won’t have the time or inclination to learn.

Thus we aren’t likely to lose the dedicated applications. But data federation is a different issue. Here we have two possibilities: customized application-specific indexing vs. a generic data store.  The iPhone, for example, implements search across disparate items such as contacts, documents, and applications from a single interface, even though the data are all managed by individual applications. These applications still suffer from the lack of flexibility of the underlying schema, but they do balance access with task-specific interfaces nicely.

The other way to go is to use some generic data store (e.g., redis) to store all application data. With the barest minimum of data typing (this is a string, this is a date, etc.) it should be possible to build a generic search interface for any data.

The problem, of course, is that rigid schemas imposed by different applications still remain. What can we do about that? One possible solution is the one espoused by David Karger, that involves authoring. Another possibility is to give up on semantics, and let people create simple structures (lists, tables, etc.) that can be used to do light-weight, ad hoc organization. Basically, most PIM applications can be supported by a stripped-down version of Excel, packaged to remove all the messy controls. You get all the flexibility of data access, few constraints about how data can be organized, and all of this can be stored in a generic way that is searchable and can be transformed in arbitrary ways. Oh yeah, and you can program it, too.


  1. Gene, I agree with you that mny people won’t have the talent or inclination to author their own interfaces. But I don’t think that provides shelter for traditional applications. For anything that can be authored can be published as well. As soon as just one talented person decides to create the right interface, that interface can spread. Also, authoring needn’t happen from scratch: just as with web pages, it’s likely people will create what they want by incremental modification of previous examples. Our Dido project ( encourages that—you can download our nobel prize winners exhibit and turn it into your personal address book.

    Federated search is nicer than fragmented search, but search isn’t enough (see our CHI ’04 paper: “the perfect search engine is not enough”). We need to defragment browsing (navigating from one item to related items) and aggregate visualization (seeing combinations of things of different types, and the relations between those things). A unified data store is nice (see “Data Unification in Personal Information Management”, CACM January 2006) but if you can’t look at it all together you’re missing out on a lot.

  2. David, what you say make sense, but I wonder about two things:
    1. If it is easy to disseminate interfaces that work over the aggregated data store, you may be vulnerable to malware and having your entire dataset compromised. One challenge in disseminating such software is to have a vetting process designed to minimize the propagation of malicious code. (Because you know if the channel is seen as effective, the spammers and hackers will come.)

    2. In terms of browsing, I think that the same infrastructure that supports indexing can be used to drive browsing. In fact, from the user’s perspective, I think there need be little distinction between the two. (I tried to make that point in my thesis; see my CHI 97 paper).

    In the end, I think we’ll probably see some hybrid solution, but I expect that ease of use will trump functionality for common tasks, and functionality will trump ease of use in the tail.

  3. Fair questions Gene.

    1. I think the “authoring” approach to interface creation is actually a good way to _reduce_ the risks of malware. Implicit in the notion of interface authoring (versus programming) is the idea of a domain-restricted vocabulary with which interfaces can be specified declaratively as opposed to procedurally. The fundamental _weakness_ of a domain-specific declarative specification is a good protection against malware. If all the declarative language can do is specify the way pixels get drawn on the screen, then the most malware can do is paint funny pictures on the screen. Our Exhibit framework ( is an example of this kind of domain-specific language: it allows the author to specify interface elements like maps and facets that filter the data on the map, but does _not_ allow the author to specify behaviors that can damage the data being displayed.

    2. I agree that there are strong connections between search and browsing but an equally important issue is visualization. We’re used to searches returning plain old lists of homogeneous results. Applications, on the other hand, present highly heterogeneous visualizations of pieces of information and the relationships between them. When I look at a calendar, there’s tons of information conveyed by the spacial layout. When I read an email message, I see it in context of a whole list of email messages. But I also so rich detail about the particular message—sender, subject, date. I have widgets that let more perform standard operations on the message—delete, reply, forward. All of these elements can and should be brought into the end-user’s authoring environment.

  4. David, having browsed the widgets exhibit, I am left with the impression that these interactive tools are limited to browsing and searching the data, but the user cannot create new records or modify existing ones. If this is true in general, we have a problem of where the data comes from; if this is not true, what precludes a malicious tool from scrambling or exporting the data?

    I agree about the merits of providing context-aware interaction, but it seems to me that this goal is at odds with the declarative, scripting approach.

    Say I have some data that lists addresses and the years that a person lived at each address. I would like to build a visualization that shows where each individual lived over time, with moves indicated by arrows and places of residence indicated by circles where the area of the circle is proportional to the length of residence at that location. Furthermore, I would like to be able to edit that data, and add new records. What would I need to do to create such an interface?

  5. You’re absolutely right that the exhibit widgets are purely for visualization, not authoring. For that we’ve extended the exhibit framework with Dido: . One thing you’ll notice with Dido is that there is no explicit data manipulation code. Instead, the editability of data is _implicit_: you WYSIWYG edit what you see, and the framework takes care of updating the data in response. Since there’s no specification of data modification, there’s no way to maliciously manipulate it.

    Of course, the kind of editing you can do with Dido—directly modifying some attribute of a record, or creating an entirely new record—is quite limited. There’s no mechanism for computed updates or bulk updates. For this you’ll need some kind of programming language. And of course, once you allow modification, you open up the possibility for malicious modification. But I hold out the hope that a language designed specifically for manipulating data records (e.g. sql) might be easier to inspect for malicious intent (not by everyone, but by enough people to keep everyone honest—see my previous comment).

    As for your goal of creating a visualization, part of that can be handled by exhibit right now: it’ll let you make a map with circle-size indicating length of residence. Something like . That info can be edited using Dido. Not in a fancy way—you’ll be editing raw fields (entering latitude/longitude and residence duration) as opposed to dragging and resizing circles. And there’s nothing that will give you the movement arrows you want.

    But that actually points to another piece of my imagined future. If we succeed in letting end users author applications, where are all those trained application developers going to go? The answer is that they will become developers of the widgets that make up those applications. Some application developer will create (maybe sell) a widget that lets you plot a _path_ on a map, with arrows connecting points in sequence. And maybe it’ll be editable, with values changed by dragging points over the map. You’ll use this in your app to plot your address book. I’ll use it to plot buffalo migrations. Same widget, different applications. Easily integrated because its just about pushing a data set into a view, the same way that we create charts in spreadsheets nowadays.

Comments are closed.