An article on genealogical research and vocabulary.

I wrote this article at the invitation of the online managing editor of Avotyanu back in January. I thought it might be interesting to my blog audience as well.

How Research Happens

“‍To define the terms is to win the argument.‍” I first heard that saying in my childhood, and it impressed me greatly; how could terms have such a large impact on success? As I have grown I have learned that the words we use tend to infiltrate our thoughts, guiding them in particular directions; they do not prevent our thinking in other ways, but they do change the effort required to do so. The words themselves suggest a story, and to use them to tell a different story requires effort.

Many of the terms used in genealogy today do not lend themselves to the way I see this endeavour. “‍Source‍”, “‍Evidence‍”, “‍Conclusion‍”, “‍Fact‍”, “‍Proof‍”: these are words that illuminate particular aspects of research, and not the aspects that I find most useful. In this article I wish to suggest a different kind of story about research, and a different set of words to support it.

A Research Narrative

Once upon a time our ancestors lived lives, met people, and did things. Nearly all of that past is completely and irretrievably lost, but some small portions of those lives created artefacts that still survive today: government and church documents, personal letters and diaries; monuments, footprints, photographs, eponyms, traditions, and so on. These artefacts were created freely by everyone without much thought about our future use of them, and reconstructing a picture of the past from the limited contents of these artefacts is the job of a historian.

When I discover an artefacts, I generally find it makes a number of claims. A photograph makes the claim “‍a person existed who looked like this and posed for a photograph;‍” a birth record makes many claims, including the existence of a birth event and three people, with roles the people played in the events and names by which the people were known and the date and location of the event.

There are two important things to keep in mind about the claims made by artefacts. The first is that they are not always correct, a truth that words like “‍information‍” and “‍fact‍” disguise. Mistakes happen, memory fades, and there are many circumstances were people simply lie. The decision to believe any given claim is just that: a decision, one left to the researcher’s judgement.

The second important property of artefacts’ claims is that, with very rare exceptions, artefacts do not refer to one another. The decision to believe that two artefacts are making claims about the same person is, again, a decision; a decision whose existence can be hidden by words like “‍evidence‍” and “‍source‍”. Artefact X may claim “‍a person named Tom Jones was born 1800-03-05‍” and artefact Y may claim “‍a person named Tom Jones died 1880-07-09 in his eighty-first year‍”, but it is the researcher who claims that these two people were the same person; the researcher, and not either of the artefacts. “‍Matches‍” are claims of cross-artefact equivalence created by the researcher.

Once we have a set of claims about a common topic, either from one or several sources, we can often infer additional claims from them. Each of these inferences is based on our understanding of what kinds of claims “‍make sense‍”: people are only born once, British parents with a shared surname generally married prior to the birth, people named Elisabeth are almost certainly female, etc. Some people are more liberal with these inferences than others, but almost all researchers end up with a mix of claims whose source is an artefact and claims whose source is an inference that the researchers created.

Taking the set of matched claims from artefacts and the additional claims we have inferred from them, we arrive at our belief about how the past was. That belief is nothing more nor less than a set of claims we chose to accept as true. Which claims we believe is, at its core, our decision.

Putting all the pieces together, we have the following research narrative:

We discover artefacts;
We parse the claims those artefacts make;
We select a set of claims as “‍relevant‍”;
We match up the things those selected claims discuss;
We infer additional ideas from those claims, often including that some claims are wrong;
We accept a set of claims as our belief of how the past was.

These steps are also illustrated in Figure 1.

Figure 1: An illustration of research, with errors appearing at most steps

Of Two Minds

The result of the research process is a belief about the past. Some like to call this belief a “‍conclusion,‍” but that phrase has always bothered me, suggesting as it does some level of finality. We start with a belief (e.g., “‍I have ancestors‍”) and as we research that belief changes.

One common state of research is to have multiple conflicting beliefs. Some of these take the form of dichotomies: “‍either there were two cousins sharing a surprising number of vital statistics, or there was one person who often claimed that his uncle was his father.‍” Others are unresolved contradictions: “‍she can’t have died three years before her last child was born, but everything else I believe points to her doing just that.‍” Being of two minds is the natural state of research, the almost inevitable side-effect of inquiry.

Having several equally-likely but contradictory beliefs may also be the “‍most correct‍” interpretation of the available artefacts. In a community with widespread name reuse and not a lot of record-keeping, how likely is is that the surviving artefacts make enough claims to clearly indicate even how many people with the same name there were, let alone which one of them was engaged in each event the artefacts claim occurred? There may never be enough evidence to defend any one “‍conclusion‍” as significantly more probable than another; the desire to force a single conclusion from inconclusive data is one of the most common paths to sloppy research.

The conclusion fallacy is also one of the most common causes of tension in cooperative research. When you combine the naturally inconclusive nature of the extant claims with the natural differences in the priorities and perspectives between researchers, it is almost inevitable that researchers will disagree on which alternative is “‍most likely‍” or “‍the best alternative‍”. The more researchers are involved in your research, the more important it is to accept multiple contradictory beliefs simultaneously.

Researchers Create Claims

In the research narrative I presented earlier, I asserted that researchers create claims both in the matching of other claims and as a result of inference. I have found that researcher-created claims are not widely recognised by genealogists, so in this section I intend to provide an introductory overview to the topic of historical inference.

First, let me emphasise the difference between an artefact making (or claiming) a claim and a researcher creating (or constructing) a claim. When a document claims “‍John was 15 when he died,‍” that claim stands on the authority of the document itself. If I have reason to doubt the document, I likewise lose faith in the claim. Conversely, when a researcher creates the claim “‍John was 15 when he died,‍” the researcher also creates a supporting argument or inference, such as “‍based on the birth and death dates we already know.‍” The claim that the researcher created stands or falls based on the strength of the inference created to support it, not on the reputation of the researcher.

In genealogy we often speak of the “‍source‍” of a claim and of “‍citing‍” our sources. When we speak of sources we generally mean “‍that which leads us to believe;‍” hence, the source of a claim is either an artefact or an inference. The inference may be attributed to the researcher who created it, but to cite an individual as a source is to treat the individual as a witness, not a researcher.

The Structure of an Inference

I believe that inferences are first-class citizens of genealogical research and should be discussed on an equal footing with artefacts and claims. To help reach that point, we need to explore what an inference is. The logicians and mathematicians in the audience will recognise that I am glossing over many details, but at its core each inference contains three parts:

A set of consequents. These are the claims who cite the inference as their source.
A set of antecedents. These are the claims from which we derive the newly created claims. If one or more of our antecedents is false, the inference no longer holds and we have no reason to believe any of the consequents.
A rationale. This can be anything from a law of nature (e.g. “‍each person has exactly one biological mother‍”) to general trend (e.g. “‍the more time passes between an event and the creation of an artefact describing the event, the more likely it is that the claims of that artefact are incorrect‍”). It explains why we believe that the antecedents are sufficient evidence to infer the consequents.

Every researcher has created many inferences. To see some of your own, identify a claim in your belief that is not explicitly claimed by any artefact and ask “‍why do I believe this?‍” For example, if you ask “‍why did I say this person was male?‍” and answer “‍because Benjamin is a boy’s name,‍” you have identified the consequent (being male), the antecedent (being named Benjamin), and the rationale (Benjamin is a boy’s name).

An aside about the words “‍source‍” and “‍evidence.‍” Taking the most straightforward linguistic application of the words, the evidence that supports a particular inference are its antecedent claims, but the source of the inference is its rationale. That use of the words is not universal among genealogists I have spoken with, nor does there seem to be a consensus of use at all. Because of this, I prefer to use different terms: antecedent, rationale, and “‍support.‍” Something (artefact, claim, inference, or rationale) supports a claim if disbelieving that thing is enough to no longer have reason to believe the supported claim. Thus, the most common use of the “‍sources‍” I have seen I would term “‍supporting artefacts.‍” I prefer that longer term because “‍supporting‍” does not suggest an exhaustive character as “‍source‍” does.

By far, the most common inferences I see are matches, inferences with the consequent “‍person X in claim A and person Y in claim B are the same individual.‍” There are rationales behind these inferences, often of the form “‍it is unlikely that two individuals would be this similar‍” where the antecedents are the similar claims. These inferences are arguably the core component of research; their existence is what turns an archive into a history. Too often I see these absolutely central inferences being glossed over as if they were self-evident or even non-existent. Do not forget to communicate how and why you assembled your view of the past from the claims made by the various artefacts you considered.

Proofs

As part of my graduate work, I took several courses on mathematical proofs. In those courses, we defined a “‍proof‍” is a social construct, a form of persuasive writing. The goal of a proof is to convince the reader that

the author has constructed a chain of inferences that lead to the claim being made;
the original antecedent claims of the inferences (called “‍axioms‍” in logic) are believable; and
the rationale associated with each inference along the way supports its inference.

Insofar as I can tell, that is a good description of proofs in genealogy too. A genealogical proof is a document whose purpose is to convince the reader that inferences with strong rationale lead from the set of relevant artefacts to the researcher’s belief.

I am of two minds when it comes to genealogical proofs. They are the only aspect of genealogy as commonly discussed that gives inferences their proper due, but they also help to enshrine conclusions as the ultimate objective of research. An additional mark against them is the colloquial understanding of “‍proven = true‍” as opposed to the more accurate “‍proven = convincingly defended.‍” Let us all focus on the good of proofs—the sharing of our inferences and their rationale—but leave the persuasive writing to those who have a legitimate need to persuade.

Summary

(I was going to title this section “‍conclusion,‍” but what kind of example would that set?)

Genealogical research is the process of discovering artefacts, parsing the claims they make, selecting and matching those claims, inferring new claims, and selecting a set of the resulting claims to believe. Every step along this path is uncertain and error-prone, and the natural state of research is not one conclusive belief but a set of alternative beliefs. One key activity of research is the creation of inferences to support the matching of claims and to make explicit what other claims suggest. Expressing these inferences and their supporting claims and rationales clearly is key to helping other researchers understand your work.

There are many common terms today that obscure various aspects of sound research practises. “‍Information‍” and “‍fact‍” disguise the uncertain nature of claims; “‍evidence‍” and “‍source‍” hide the inferences that bridge artefacts and beliefs; “‍conclusion‍” and “‍proof‍” ignore the uncertainty that is inescapably present in the very incomplete picture of the past that surviving artefacts can provide. I do not claim that the terms I have suggested in the place of these terms are themselves ideal; no terminology is perfect, and some of my readers have probably already identified aspects of research that my terms obscure. However, I offer them to you, my fellow researchers, with the hope that the different perspective they suggest will help us all to recognise and correct some of the weaknesses in our own research practises.