Genes, culture, graphs, anatomy, and puzzles.

John, Art, and Harrison

John Clark has been working for a year or so on a decentralized distributed graph database-based family history tool called trepo. It is still in development and not yet an end-user tool, but it is moving along well and working prototypes of some non-trivial aspects of the design are available already. He and I had a pleasant and lengthy conversation about trepo and other topics last September at the FamilySearch Developer’s Conference and when I saw him at RootsTech we immediately sought out a place to engage in more conversation.

Before going into that conversation, I should mention two interactions I had with personnel of family.me. During the innovator’s summit, Harrison Tang, president of family.me, presented a clear talk on some of the theory of user interface design and how they had worked on making the family tree a visually pleasing experience, focussing to some degree on the visual reward aspect of popular click-based web games. During a conversation with Art Haedike, lead developer of family.me, had suggested a phrasing that stuck with me: good family histories need both good skin and good skeletons.

My conversation with John started talking about user interfaces and the ideas shared in Harrison’s talk. We were both concerned that if the game-like rewards were all on the surface, it would reinforce an already troubling-to-us aspect of family history: there is more intrinsic reward in adding a dubious connection than in removing one. We brainstormed a lot of ideas for making a game-like interface to the intellectually rewarding parts of family history, such as the vast puzzle that is matching up the individuals, places, and events referred to in distinct claims. I have several pages of sketches we made during our conversation and, though they are far from fleshed out, I am convinced there is merit in our ideas.

After that, topic had run its course we started discussing our individual projects. He shared his trepo/vgraph system and discussed, to borrow Art’s analogy, the fact that while he had a data model to put into the graph to represent the skin (the conclusion or belief) he was not sure what the data model for the rest of the anatomy should be. I then explained how I saw polygenea filling that gap. Our conversation was lengthy and enlightening as we both came to understand more fully the other’s point of view. In the end we came up with a merging of ideas that, I believe, went beyond what either of us had initially. It very much presented and codified as a two-layer skin-and-flesh system, an idea I had initially resisted but which he demonstrated could alleviate some problems.

As an aside, after talking with Art and John and gaining from them a new understanding of the value of multi-layered approaches, I attended the FamilySearch town hall meeting. After the meeting I had a short conversation with Don Anderson in which I became hopeful that if I were to propose an anatomy model for GEDCOM X it might actually be adopted. I’s leaning toward having more than two layers in my proposal there; my current very-rough drafts have five layers including some inspired by Tony Proctor’s Stemma and some from conversations I had two years ago at RootsTech with a few professional genealogists (which onces I do not recall) when manning the FHISO booth across the isle from the APG booth. More on that if/when I convince myself the idea has enough merit to make public.

Tim

An unexpected encounter this year was with Tim Janzen, a medical doctor and proponent of rigorous DNA genealogy. Because the genetic genealogy reports I had seen previously were of dubious validity, I had not given detailed thought to DNA genealogy before and had suggested some doubt about that entire line of inquiry in my RootsTech talk. Tim did more to give me hope than any previous comment, primarily by actually showing me some of the techniques used to align autosomal single-nucleotide polymophisms (atDNA SNPs) for recent-generation genealogy.

I should note, here, that I do not understand the field of genetic genealogy nearly as well as I would like to. That said, I am interested in the following, being a mix of things I had previously known and learned from Tim, the Web, and other things I’d learned before:

We get hundreds of thousands of SNPs per person
People tend to inherit runs of SNPs from the same parent; where the runs break and with what frequency is probabilistic, but with either known or knowable probabilities.
Mutation rate is very low
If you look at the SNPs of both parents and a child, you can map which parent provided ach run of SNPs (with error, since the parents will share some)

Thus, if you look at the DNA of two children of the same parents (where the shared parentage is an input), you can (a) verify the shared parentage by showing many runs of similar SNPS It is not yet clear to me if run lengths are enough to distinguish between shared-parent and other ways of getting two-sets-of-shared-grandparents… and (b) identify around half of the SNPs that the parents did not share with one another. Given SNPs of more descendants (with relationship postulate inputs) we can start to tease apart more about the deceased individuals; for example, by sequencing the children of siblings we can start to identify some SNPs as belonging to the siblings vs their spouses.

That said, as far as I can tell so far, even if we had the complete set of atDNA SNPs for every living person So far as I know, X, Y, and mitochondrial tests don’t change this… I still have not seen any algorithm that would let us derive the structure of progenitors for more than a few generations back.

Chris et al

Some of my more interesting conversations were with Chris Whitten, president of WikiTree and others of the WikiTree community. WikiTree is based on a model where most data is stored only as text, with only the core ideas also stored in machine-understandable formats. Because of that, and because my own work is in machine-understandable data, I had not previously given WikiTree much thouht. However, as I visited with them I found myself impressed in many ways.

I align philosophically with much of WikiTree’s objectives, including supporting collaborative, rigorous research and a commitment to non-commercial freedom of information. What I had not realized, thought, was how successfully they had created a community with a positive and thoughtful culture. I am sure they have their issues (every community does), but from what I was able to learn it was far more good than bad.

The conversations I had with WikiTreers were very varied in content and I do not intend to enumerate them, but the nature of the conversations and conversationalists impressed me. At some point I hope to investigate more fully if WikiTree’s community is a result of good fortune, a result of mechanics that discourage poor collaborators, or a result of mechanics that help people become good collaborators. When (if ever) I’ll have time to do that, I do not know. If there are any social scientists in my audience, I suggest WikiTree as the subject of your next study.

Others

There is a danger in listing specifics of an event, chief among them being the likelihood that I will slight someone I failed to mention. I had pleasant talks with many people, individually and as company representatives; in fact, I can think of only two conversations that went poorly, and I attribute both to timing as much as anything else: by Friday afternoon a lot of the people in the smaller booths in the expo hall were tired and not at their friendliest when approached by someone who was not only not interested in buying their product but actually talking about designing open standards which could increase their short-term workload.

On the whole, I am pleased by the direction of family history. It may not be moving very quickly and there are undoubtedly many companies at RootsTech that are there to make money without considering what (if any) social benefit their product provides, but the overall trend is promising. One day I may even be able to tell my computer scientist colleagues “‍I do family history‍” and have them treat it the same way they do when I say “‍I do graphics‍”—assuming I mean a technical computer science research—rather than the way they do when I say “‍I do board-games‍”—assuming I mean a hobby or pastime.

We can dream…

And in the meantime, we can fix the problems we know how to fix. They are many, but they are yielding to pressure.