To Do

  1. Finish database
  2. Write scripts to suck BYU data into our database
  3. Toolbag of interesting measures (see last week’s notes)

Database Organization

New database diagram, as discussed. Full-size image.
Updated DB Structure

Still to be added to the database:

  • MaritalSealings
    • HusbandOccupation
    • WifeOccupation
    • HusbandProperty (bool)
    • WifeProperty (bool)
  • Person
    • BurialPlace

To Do

  1. (DONE) Put -1 into database for BYUID if we create the record (marriage and person)
  2. Finish abstract tonight, send out a copy tomorrow to Kathleen, Joseph, and Worthy
    • If no other comments, submit by Sunday.
  3. Write scripts to suck BYU data into our database
  4. Toolbag of interesting measures (see below)

Discussion

  • We want to really look at these measures over evolutionary graphs
    • Get a large toolbag of interesting measures first, then worry about the efficiencies and computational complexity later.
    • In fact, we want to grab any interesting measures we can think of, which might be interesing to computer scientists and others, then pass our ideas by Kathleen and see what she think is interesting to her project.
  • What do we mean by time interval, when looking at measures over time? diagram
    • Absolute time interval OR number of changes the node/edge underwent?
    • Do we look at “one month” or “three changes” of an object
    • What does number of changes mean for a graph?
      • 3 changes
        • For one node, this might be a week of absolute time
        • For another, it might be a decade of absolute time
      • How do we deal with edges in this case, which might not overlap in the “number of changes,” but may actually have an overlap temporally?
    • For the rest of this note, interval may refer to either.

Toolbox of Evolutionary Measures

  1. Degree (in/out) over an interval
    • Maximum degree for that time interval
    • Minimum degree for that time interval
    • Change in degree
      • Average degree over the interval
      • : Graph the function degree x time, then take the derivative.
      • Rate of change of the degree (or acceleration of degree)
        • Node X was growing the most (highest out-degree – having the most children) during time interval Y.
        • Node X was marrying the most (highest in-degree – most new incoming wives) during time interval Y.
      • Total increase (or decrease) in degree (think total elevation change in gps)
  2. Connectivity
    • Explaining by example for now: overview
      • We have these two notions, Sankey diagrams of marriages (geneological flow), and collections of people in church organizations (annointed quorum). Both of which change over time.
      • Question: what is the driver? Does family structure drive church organization, or church organization drive family structure?
      • Take the subgraphs (lineage) for each person in a church organization (Ex: BY, JS, …), with their descendants
        • There will be some connectedness, likely in the individual descendant Sankey diagrams
      • “Join” all the subgraphs for that group.
        • Note: each subgraph is evolutionary, so it changes over time.
        • Basically connect back up the person links that exist between those subgraphs just considered.
      • Answer: How does the connectedness change over time in this new graph? Does the connectedness increase (positive derivative)? Does the connectedness (number of connections within the graph) remain the same? Does it increase and then decrease?
        • If church drives family, we would expect increasing connectedness. They are interconnecting based on hierarchy faster than normal.

To Do

  1. Complete and submit DHF abstract
  2. Write algorithms and implement temporal sets
    • Marriages are perfect examples, see May 9 meeting
  3. Build table structure for extended marriages (tribes?) so we don’t have to build them on the fly from the DB every time.
  4. Build the database and scripts to import Jill’s data

  5. Directed graph viz with a person, the marriages they are linked to (adoption, birth, marriage)
  6. Add levels of separation to sankey (0 is just the marriage requested, 1 would be their parents’ marriages and children’s marriages, etc)

Discussion on Abstract

We want to depict the CS side, with strong Humanities application

  • Lead with temporal nature of the data/work
    • Temporal: metronome axis (time in min/sec/etc) vs action/change oriented (time as series of events)
  • Evolving structures : marriages evolve
    • Within marriage
      • changes at events such as new wife, child born, wife dies, etc
      • It’s still the same item! It changes while maintaining its identity
    • Observe how networks/communities evolve
    • Identity: Objects have identities, even though they are evolving structures
      • Marriages have an identity that stays the same, even though participants come and go
      • People (marriage-to-marriage) have identities. People constitute these things (marriages, groups)
      • The entire network has some identity. “Process that’s happening”
      • Church organizations (AQ, Q12,…) have identities, but change over time
        • Merging organizations, splitting organizations, relations between them (people they share). Can also think of this in a corporate sense, such as the previous circus example.
      • Tribes: Marriages come and go from the tribe, but collections of marriages also have an identity

These are networks of objects that are morphing/evolving/temporal! We need to get this across, and fully embrace it.

To Do

  1. Write algorithms and implement temporal sets
    • Marriages are perfect examples, see May 9 meeting
  2. Build table structure for extended marriages (tribes?) so we don’t have to build them on the fly from the DB every time.
  3. Build the database and scripts to import Jill’s data
  4. Write DHF abstract (see last week and week before notes)
  5. Directed graph viz with a person, the marriages they are linked to (adoption, birth, marriage)
  6. Add levels of separation to sankey (0 is just the marriage requested, 1 would be their parents’ marriages and children’s marriages, etc)

Discussion Points

  1. Requirements for graduation?
  2. Honeymoon time off, Nov 17-25
  3. Moving offices (532: 4, 8 or 434: 3, 7)
  4. Daniel sharing the Mormon viz on snac list
  5. Digital Humanities Forum abstract
    • Will try to construct abstract draft today from Kathleen’s IATH application
  6. Building the database
    • Will start constructing database and scripts today
  7. Temporal Sets (Time-Dependent Sets)
    • New ideas for using timespans in the structure (insert for a given time span). Is the performance hit worth it?
    • See example below figure1.
  8. Measures on Temporal Networks: Time-dependent Data Structures (generalized to temporal pointers with temporal aspects on the nodes)
    • Using partial computation
      • Degree (in and out) can be easily done by storing the degree at each node. Increment/decrement when new time points come in (faster degree computation).
        • For directed graphs, this will be more important when wanting to calculate in-degree.
      • Connectivity - can we denote connected components or the size thereof?
      • Farness (1/closeness) - can we store the sum of distance to all other nodes? How often need to recompute?
  9. To Note: When visualizing social networks, knowing the future structure is very good (see Visualization Methods for Longitudinal Social Networks and Stochastic Actor-Oriented Modeling, fig 2)
  10. Some papers
    • See notes on website for 3 paper summaries

Discussion

Network Metrics

  1. Degree (in- and out-)
  2. Betweenness Centrality
  3. Closeness (Farness) Centrality
  4. Average Path Length
  5. Average Cycle Length
  6. Characteristic Path Length
  7. Global Efficiency
  8. Clustering Coefficient
  9. Cliqueishness
  10. Homogeneity
  11. E-to-I Ratio
  12. Density
  13. Connectivity

Visualization

  • There are some interesting visualization problems that still need attention
    • n-ary connections, n-ary relations. That is, how to show that more than 2 objects are connected
      • Ex: hypergraphs and hyperedges
    • 3-way connection (chord) in a chord diagram. Can we show a connection between 3 people or objects in a meaningful way?

Temporal Graphs

We really want to start looking at aggregate information. Not looking at snapshots of a temporal graph (or structure) at a certain timepoint, but over time, across time, etc.

  • Measures
    • In- and out-degree
      • We know what this looks like for a snapshot. Or over over different snapshots.
      • What does it look like for a whole temporal network??
        • Is it the average over all time? Do we talk about standard deviation?
          • Importance of anode over all time: average degree of the node over all time? average connectivity of the node over all time?
          • Graph-wide: average degree of the network?
            • is this important? It’s the average (of all nodes) of averages (of all time for each node).
    • Connectivity over time
  • Not just snapshots!
    • We need to push against the current ideas that a temporal network is just a series of snapshots that we can (or should) analyze individually.
    • You can’t always perform the normal metrics over a snapshot of the temporal graph
    • It could be that the network didn’t exist at the particular snapshot time in it’s true form, but was only in that form at that point as it transitioned from one point to another.
      • Ex: archive of the web. There’s not true entire web snapshot at a given time point, but it’s constructed using pieces from older/newer web crawls around that time.
  • Derivates of the temporal graph
    • Maybe we consider changes as either clock ticks (metronome-like, such as minutes/days) or events (time only changes when something happens to cause the network to change).
    • Then we could look at a “snapshot” and +/- 4 changes (ticks/events)
      • What does the change in the network look like? (What does it’s change look like?)
      • Is it a time of rapid change?
      • What metrics should we perform?
      • Are groups formed quickly during this time?
      • Do groups change?
      • Does the connectivity of the network change and how?
    • We can think about this as the derivative of the graph. Given a timeframe, what’s the trend at this time? What’s the change of the graph look like?
    • Example: (twitter)
      • Justin Bieber has many followers. A system’s candidate hypothesized that followers connected to a well-connected hub are less likely to re-share any statuses from the hub, since all their friends are likely also connected and will get the original message. Therefore, as JB tweets, his followers are less likely to retweet, since they know their friends are also connected to JB and will also see the original tweet. This is the dampening effect, where as less-well-known tweeters may have their messages retweeted more heavily.
        • Can we see this by looking at measures over the changing temporal network (looking at the network change before, during, and after his tweet)?
        • What measures are important?
    • This needs to be about the entire network, and changes within the network (structure, etc)
      • There are graphs out there that show information over time, but not really measures of the network itself:
        • We’ve seen (insert link) 2-D graphs of tweets/sec of emergencies, disasters, or events, and how they have a general shape
      • Can we use those points (at the uptick, max, and bend in the downfall) of popularity change in those graphs/viz as a starting point to look at the actual network and how it changes over that time?
        • Use those times to pic up the state of the network, then look at network derivative through those times
      • We’d like to look at more comprehensive things of the network as they change through time (derivative across a point)
        • How do groups from in time?
        • How does the network change?
        • Is it a time of fast growth? Or is it stagnant?
        • What’s important that changes?
        • Do important nodes change (which nodes have highest centrality measures across this time, are they the same person)?

To Do

  • Kathleen will be out until June 16, with limited contact
  • Add place to everything in the DB (see below)
  • Work with Joseph on structure of the database and getting the data into good format
    • He will look up IDs (in the last name search) of the people he’s modified, since some were wrong because the list I gave him didn’t account for middle names stored in the GivenName field (ex: he has wrong id for William Wines Phelps: I gave him a William Phelps who was likely a descendent of William Wines Phelps, since the search for William Phelps didn’t match William Wines Phelps. Since that was the only one, he updated the information as if that person was William Wines. But, we need to make sure the IDs are correctly matched in case they actually do exist in the DB (as William Wines does), so that when we suck them in, we don’t have people as their own children, etc.
  • Start working on the abstract for DHF14. Kathleen will be out of contact until after due date, but has given us her proposal to IATH so that we can use text from that for the abstract. Joseph will also help, but we need to figure out travel if we get accepted and want to go for the symposium.

Database Organization

  • Created an organization scheme to consider, shown below: database organization.

Table Structure for DB

  • Marriage
    • ID
    • Type (enum)
      • Sealing: time
      • Sealing: eternity
      • Civic: civil
    • PlaceID
    • HusbandID
    • HusbandProxyID
    • WifeID
    • WifeProxyID
    • DivorceDate (Civil Divorce)
    • CancelledDate (Church recognized cancellation of sealing)
    • OfficiatorID
    • PrivateNotes
    • PublicNotes
  • Non-Marital Sealings
    • ID
    • PlaceID
    • AdopteeID (Person being sealed)
    • AdopteeProxyID
    • MarriageID (Couple to whom the person is being sealed)
    • MarriageProxyID
    • OfficiatorID
    • Date
    • Type (enum)
      • Adoption
      • Natural (biological sealing - child gets sealed to their biological parents)
    • PrivateNotes
    • PublicNotes
  • Non-Marital Temple Rites
    • ID
    • PlaceID
    • Date
    • PersonID (Person undergoing rite)
    • Type (enum)
      • Washing and Annointing (may be split into Washing, Annointing, and Blessing)
      • Endowment
      • Second Annointing
    • AnnointedToID (For second annointing, the person is annointed to someone else, usually husband)
    • CopyOfBlessing (boolean of whether a copy of the blessing is available)
    • PrivateNotes
    • PublicNotes
  • Non-Marital Temple Rites Officiators
    • Non-Marital Temple Rite ID (which rite were they officiating)
    • PersonID (ID of officiator)
    • Role (What role did they play in the rite)
    • PrivateNotes
    • PublicNotes
  • Places
    • ID
    • Building/Street Address
    • Town/City
    • State
    • Zip (if available)
    • Latitude
    • Longitude
    • PrivateNotes
    • PublicNotes
  • Mission (did not finish discussing this)
    • Date
    • PlaceID
    • PersonID
    • CompanionID (person who went with them… should they be linked in another way?)