Notes from Jason

Humanities perspective and a technical perspective
Wanting to bring the linked database (connecting multiple databases together)
Streamline the Marriage stuff..
Line out the humanities research question and what are your technical research questions. What are the exact things you need the grant to do?
Productive way of combining the databases to answer the questions we want to answer, then we’d be in a position to show how the smaller (joint-db) part fits into the bigger research questions with the marriage project.
- Practice of polygamy and its origins, and scholarly history
- One component of that project is this piece…
Need some sense of what this metadata looks like and there are ways people have gotten around crosswalking data
Want the reviewers to be captivated by:
- Humanities interesting history stuff
- Very pragmatic nature of the need (ex: differing datasets and it’s hard to coordinate these together and this will help)
Should include both the humanities and technical aspects of the previous works
Talk about doing: interface, schema to join the desparate datasets,
This is allowed and encouraged to be experimental (ex: We think this will be better, and here’s how we’re going to design and test)
- Stage out the workplan so that they can see how you’re putting it together and going to move forward (piece by piece)
- They streamlined and expanded what’s allowed. Removed a lot of emphasis on innovation, but added a framework and require a way to dissemenate the knowledge learned (but any are allowed as outcomes, however you feel is best to dissemenate that knowledge–scripts, samples, data, schema, etc).
  - No expectation (at least not for 40-70k), no one expects you to have formal polished tool or anything. Have a realistic set of expectations as to what the outputs are
- “Here’s how we better understand the relationships between these partners…”
- “This is how one dataset represents a relationship, and here’s how another one does it.” Go on to understand how they are different and linked better, and we can then better understand how those relationships are detailed.
People are interested in interesting arguments and details
Just need to get into enough detail so that people get captivated by the details but not caught up or tripping over or overwhelmed by the details.
This kind of data curation component is the least sexy thing of this kind of scholarship, but it’s the most important. This is a common problem that’s rarely addressed but commonly raised.
Need environmental scan (what’s been currently done and what’s out there)

What Luther Suggests

Prototype (initial)
- Match table
  - (DB, table, row) matches to (DB, table, row), metadata: who said they were the same, when, and notes about the match
    - Same-As assertion
  - Need to know how to identify a thing in a generalizable form (db entry, xml entry, XLS entry, etc)
    - Lots if idiosyncratic things
- Need somewhere to store and host the database (backend)
- API
  - Into the new database
  - Into all the other databases that are linked from this one
- User Interface (to the back end)
  - Edit and query interfaces
- Good query language (cross-database) for humanities researchers
  - View of individuals
Where to go from here:
- Populate
  - Programmatic, human, from logs
  - We have the BYU to UVA connector
- Cross-database stats and visualizations
- Consistency checks
  - Do all the sources and matches actually agree?
  - Do all the assertions check out