Genealogy searches are an interesting example of many aspects of information seeking. In some ways, this endeavor reveals the limitations of our classification of information seeking systems and behaviors, such as recall-oriented vs. precision-oriented search, known-item vs. exploratory, etc. While each query one runs should be high precision (find me records for the person I am interested in at the moment), there are many aspects (dates and places of birth and death, details of immigration, residence, occupation) resulting in many queries. And often you really do want to try to find as much as can be found, so the overall task is recall-oriented. Similarly, you start with searching for facts for people whose existence you are documenting, and you can often recognize relevant records when you see them. This has all the hallmarks of known-item search. On the other hand, you may also discover relatives you didn’t know existed, facts you had not expected, new kinds of historical records, etc. This feels much more like exploratory search.
Finally, there is the issue of where to search for information, which databases to use, etc. The range of potential sources for the serious genealogist is quite broad, but for those just starting out there are a few obvious choices beyond interviewing your relatives. Ancestry.com is a family of web sites that federates access to a large range of historical data on individuals. While it’s not the only place one can start, it’s not a bad choice.
The process of using Ancestry.com typically involves creating a family tree (which can be private), seeding it with some information on known individuals, and then looking for additional information. The system can use the information you already collected for an individual to find additional records for that person. Dates, names, places, familial relationships all contribute to the query and to the ranking of search results. The system will even run automated queries for you that identify highly-reliable matches and offer them as hints.
Once you’ve found data in the historical record and incorporated it into your family tree, you can re-run searches to find more information. For example, if you find an ancestor in a census record, that will provide an approximate year of birth, place of residence, name of spouse and children, if any, year of immigration, etc. All of this can be fed into a new search, which can identify a passport application record or a ship’s manifest. The process is highly iterative, and it can be an intellectual challenge to sift through the records to find the plausible matches while rejecting (or at least deferring) other search results.
A search scenario snippet
The Ancestry.com UI helps you manage the information you’ve already found for a specific person by marking them with a green check box:
The first three items with green checks represent records I already found and saved for “Abe Mages.” The fourth is a potential naturalization record that matches the name and the date of birth. Looking further down the list, however, complicates the story:
First, there is a second naturalization record (which is the right one?), and then some other records for Abe’s wife Rose, and some others that re not relevant. It turns out that the second naturalization record is a different match on the same piece of data, because of the second name listed in the record. (Abe apparently changed his name when he arrived in 1904.)
Our strategy now might be to reformulate the query (e.g., to restrict the name to an exact match for Abe Mages), but that returns only nine results total of which five are new. One of the five turns out to contain his obituary (Chicago Tribune, 1958), and the other four are not relevant. Our query was useful in producing an important fact (the date of death), but now we need to relax it to find more information.
I won’t bore you with the rest of this scenario, but will offer some observations of the HCIR variety. The thoughts below are informed by my experiences as an Ancestry.com user and by having designed some session-based search interfaces. First of all, the obvious: it is not possible to find all useful information in a single query. The implications are as follows:
- Multiple queries retrieve overlapping results, and the searcher needs to understand this. The existing system helps with positive results, but fails with negative ones. Once I ascertain that a particular record is not relevant to a particular individual (or even to a particular tree), I should be able to mark it as such so the system suppresses it from further results. (It should, of course, give me an option to see the suppressed results for any given query.)
- Since multiple records for a given person are retrieved by a typical query, it would be useful to facet the results by name, place, date range, etc. This is not the same as grouping by person, which must be ascertained by the searcher, but it may still be a useful mechanism to triage the results.
- In some cases, a person’s name may have been spelled in several different ways. While Ancestry’s search has several mechanisms for approximate matching, it operates as a black box. It would be useful to expose some of the alternate names in the database to searchers to allow them to use their knowledge of names and spellings in the original language to make informed selections when modifying the query.
- When you find a record that you want to save, the saving process (which consists of a few important steps) lands you at the person page; to continue exploring the results, you either have to back-track three or four pages in the browser’s history, or re-run the query. It might be more useful, in some cases, to simply mark a record in the search results as one worth keeping, and proceed down the list. Then all saved records could be processed together (or at least sequentially).
- Another variant on the above item is to re-rank the remaining (as yet unseen) items in the list when a record is marked as useful based on the that that record would add. This would allow speculative exploration of the result set without the overhead (and implied certainty) of committing the data to the tree you are building. For example, say you are looking for the date of birth of an individual, and you’re faced with two conflicting records. You might try making speculative judgments on them one at a time, to see the potential implications of making one choice or another. This might turn up additional corroborating evidence right away, improving the chances of selecting the right data.
This kind of search system offers myriad possibilities for improving the search process by allowing people to explore the data in rich and dynamic ways. Speed of interaction matters, but what matters more is flow: the ability to pivot and manipulate the data, to filter, re-rank, and recombine results without breaking your stride, without losing the intellectual momentum generated by being immersed in the data.
In the end, truly interactive information seeking should not be thought of as recall- or precision-oriented. Those labels should be reserved for individual queries, and even then, may not reflect the nuances of interactive system use. Interactive search is predicated not only on quick response (although that makes certain operations more useful), but more importantly on iteration. Interaction consists of multiple moves through which the searcher explores the information space and gains understanding. Thus information retrieval systems that claim to be truly interactive must provide not just timely response, but a rich set of means through which the searcher can explore and manipulate the data.