I’ve had occasion to perform genealogical searches for my family as well as for others. Genealogical searches can be rewarding, but more often than not you wind up with nothing. So when starting on such searches one expects that little can be found; only one’s optimism determines whether to continue searching.
This weekend, my optimism paid off. Probably.
Details of the search
On a whim, I decided to dig around (again) to look for traces of my wife’s ancestors who emigrated to the United States in the early 20th Century. The family was originally from the Grodno province, but had left some time around the turn of the century and moved to Argentina to settle in a colony called Moises Ville. Some time after that, they moved to the US and ultimately settled in Chicago. This much we knew from family sources, but we had no supporting documents.
We also knew that in Poland/Russia, their family name was Volkostavsky, and that family lore says that it was changed in Ellis Island. I had searched for this name before, both in JewishGen and on the Ellis Island site, and had found nothing.
Most modern genealogical programs now incorporate some sort of a mashup with Google Maps or with some other geo-location service that can be used to show locations of the places one’s ancestors lived. Or at least their modern equivalents. When I tried that for Moises Ville, Argentina, the program drew a blank. Surprised, I turned to Google directly, which produced a set of hits, including a Wikipedia page. This led me to a genealogical site dedicated to the Jews who settled there.
Immigrants from Eastern Europe came to Argentina in several ships over the course of a few years, and the site documents the names of the families that came on each vessel. Among them, I found the promising name VOLKOSTAVTSER, a family of nine people who arrived in 1901.
I then went to the Ellis Island site, and tried searching for this name. No luck. No luck with this name on the JewishGen site, either. No luck with any variant of Volksotavsky I could reasonably come up with, based on plausible letter substitutions for Polish and Russian versions of the name. I tried something like
in addition to the -STER suffix from the Argentinian record. Nothing in the plausible year range, given that the family arrived in Argentina in 1901, stayed for a few years, and came to the US, where the father died in Chicago in 1910.
Unlike JewishGen, the Ellis Island web site does not offer SoundEx search matching, although it does have a feature that suggests possible alternative spellings, presumably from its database of names. Its suggestions, however, did not prove useful.
I then tried constraining the search based on the nationality of the people and the plausible dates. No luck. Finally, to get around the uncertain spelling of the last name,I tried searching by nationality, including Russian and Polish, and constrained the last name to start with the (minimum required three letters ) VOL, but found nothing. I didn’t see the Jewish option (and was somewhat surprised by that) because it turned out to be classified as ‘Hebrew.’ In the end, I selected “Argentinian” as the nationality, and tried again. The system turned up a bunch of people with the last name specified as “Volcastofski.” Could this be them?
The manifest shows that a party of eight people arrived in Ellis Island on June 20, 1904. The spelling of the last name seems plausible, particularly given that it was probably recorded by someone not well-versed in standardizing the spelling of Eastern European names. The first names proved to be a bigger challenge.
We knew that the Volkostavsky parents had arrived with six children, at least one of whom were born in Argentina. The children on the manifest were aged 10, 5, 4, 3, 18 months, and 6 months. That fit nicely. The father’s name was shown as Berka; our family knew him as Beryl. His wife was named Sara; the manifest transcript shows her as Lara, but looking at the handwritten manifest image, it’s clear that the cursive S and L could be easily confused. The oldest child was listed as Pearl (F), which matched our records. The second oldest didn’t match: we had ‘Celia (F)’, the manifest had ‘Sario (M)’. Then David — exact match, and Solomon vs. Shimon (not so good). The fifth child’s name is not given in the transcript, and the sixth, whom we knew as William, was shown as Wolf.
Is this the right family? Probably. The likelihood that there was another Jewish family who arrived from South America in that same time frame with the same number of children, with parents of about the right age, seems remote, given how hard it was to find them.
But there remain tantalizing questions: what accounts for the discrepancy of Celia/Sario? What happened to the name Berka? We know he changed his family’s last name to something more Americanized; did he also upgrade his own name? But why Beryl?
Reflections on the search process
With respect to the search process itself, how easy would this search have been if a more appropriate indexing algorithm were in place? Would it also have been possible to do more interactive query expansion, independently varying the dimensions of spelling, of time, of ethnicity? Is there a good way of binning values for such concepts as last names in the presence of spelling variation to construct an efficient faceted exploratory search system?
It would be interesting to do a systematic analysis of failed searches for which plausible matches were ultimately found. Perhaps that analysis might be useful in designing effective query expansion algorithms given the characteristics of the data set.
Another possible way to design effective query expansion or variation strategies might be to approach the problem from the solution: given a plausible answer to a query, systematically vary the parameters of the query to see which other records are related. Something along the lines of a retrievability analysis might also be used to identify records for which query expansion will not be as effective.
In any case, it was an exciting discovery that has multiple implications for search: unless it’s possible to prove non-existence, try different search tactics (see Bates’ berrypicking paper for more on this); there may be opportunities in faceted query expansion research; session-based search that allows the searcher to compare the results of different queries to assess coverage would be useful here, particularly as new information is uncovered during the search or from other sources.